Risk Assessment in AI Projects

How do you assess risks in AI projects? While AI- and data science-projects share risks with classical software projects, there are specific risks you should be aware of. AI projects typically fall into one of the following four risk classes, ranging from comparatively low to very high risk:

Risk Class 1 (Low Risk)

Using a Pre-Trained AI Model in its Native Domain
‌‌In this scenario, no model training and no training data is required. Nonetheless, a labeled data set of sufficient quality is needed in order to validate model quality in a principled manner. Check published accuracy metrics (substracting a safety margin) of the pre-trained model and compare these to your requirements. In contrast to implementing off-the-shelf software, risk is slightly elevated as production input data must match model training data not only structurally, but also statistically. Examples of data science projects of risk class 1 include machine translation, face recognition / face detection, or detection of person names, organization names and geographical locations in natural language (named-entity recognition).

Risk Class 2 (Medium Risk)

Transferring a Pre-Trained AI Model to a Related Domain
If no pre-trained model is available for your task, from a risk-perspective, the next best option often is to fine-tune an existing foundation model. AI-models, specifically artificial neural networks associated with deep learning, can be pre-trained on large cross-domain datasets, leading to foundation models that can then be fine-tuned to your specific domain using a smaller, domain-specific dataset. This represents an elevated risk compared to using a pre-trained model directly, as you need to supply your own training data, which needs to be of sufficient quality for the task at hand. While this training dataset can be considerably smaller than what would be needed to train a model from scratch, fine-tuning deep learning models for language or vision tasks still need thousands to tens of thousands of training examples to reach high accuracy. Typical examples of projects in this risk class include solving vision-, e.g. image classification and detection, or language-tasks, e.g. machine translation of domain-specific texts.

Risk Class 3 (High Risk)

Training an Existing AI Model-Architecture on Proprietary Data
In case no foundation model applicable to your task at hand exists, you'll need to ressort to training a model from scratch. To reduce risk, you should keep to established model architectures with a proven track record on related tasks. Judging task / model architecture-fit without extensive experimentation is still an art-form predicated on considerable experience. Moreover, training a model from scratch requires a sufficient volume of training data, and significant effort for hyperparameter-tuning, model-specific data preprocessing (e.g. outlier detection, data imputation, data augmentation, and data transformation), validation, and quality assurance. Even more severely, it is often not known from the outset whether your training data quality is sufficient to solve your task with good enough accuracy at all. For all these reasons, AI projects of this risk class should be considered research projects and be managed as such. Calculate with a duration of 3 years. There will be considerable risks of delays or project failure, especially when encountering training data quality problems. Risks will increase further if your task at hand involves less common learning paradigms (e.g. PU-learning) and reduce if you are working with well-established statistical or machine learning methods (generalized linear models, support vector machines, graphical models, Bayesian networks, decision tree ensembles, etc.) on high-quality structured data. Risk mitigation measures should include data quality analyses and feasibility analyses (proof of value).

Risk Class 4 (Very High Risk)

Custom Development of a Application-Specific AI Model-Architecture
If your task is cannot be solved by an established model architecture, you will need to establish a research and development project of often considerable complexity, scale, and risk. If successful, the result can be intellectual property of very high value, especially if based on proprietary training data. Application-specific model architectures are somewhat common in deep learning, as models are build from composable submodules. Development of custom model architecture using the framework of inferential statistics or graphical models (including Bayesian networks) is common and useful if your team has the required specialized expertise. Though, from a risk perspective, developing custom AI model-architectures should by a measure of last ressort. You should calculate with a research project duration of 3 to 5 years and with considerable risk. Highly specialized knowledge in machine learning or statistics will be required. Examples for AI projects in this risk class include protein structure prediction or custom robotics.

Conclusions

Commercial AI vendors will often downplay project risks, be it from ignorance or malice. This phenomenon seems to be especially prevalent with vendors of AI-based off-the-shelf software for overly broad yet concrete-sounding application areas like "fraud detection", "predictive maintenance" or "anomaly detection". Be wary of the fact that a presumed class 1 project may very well turn into a failed class 3 project if your use case does not exactly match the vendor's reference!