Monitoring & Maintenance

From The Foundation for Best Practices in Machine Learning
Technical Best Practices > Monitoring & Maintenance


Hint
To view additional information and to make edit suggestions, click the individual items.

Monitoring & Maintenance

Objective
To ensure that Products and Models remain within acceptable operational bounds.


15.1. Product Definition(s)

Objective
To (a) track Model performance in production; and (b) ensure desired Model performance.
Item nr. Item Name and Page Control Aim
15.1.1. Monitoring Objectives

A managerial committee ought to be established to (a) oversee Organisation Machine Learning and Product(s); and (b) warrant their effective alignment in accordance with Organisation strategies, business requirements, Corporate Governance Principles, Social Corporate Responsibilities, legal regulations and Ethical Practices.

To ensure clear managerial responsibility, oversight and custody of Organisation Machine Learning and Product(s)

15.1.2. Monitoring Risks

Document and assess the associated risks of failing to achieve Monitoring Objectives.

To (a) define Product and Model monitoring risks.

15.2. Exploration

Objective
To (a) define robust Product and/or Model monitoring requirements, inclusive of concerns related to Features and skews of the data; and (b) ensure the continued monitoring of Products and/or Models throughout their lifecycles.
Item nr. Item Name and Page Control Aim
15.2.1. Data Source Mismatch: Training & Production Data

Define and deploy methods to detect the degree to which data sources and Features, in Model training and production data, match one another. If mismatch is detected, take measures to ensure that data sources and Features are adequately matched in both Model training and production data.

To (a) reduce nonsensical predictions of the Model due to (i) missing data, (ii) lack of data incorporated, or (iii) data measurement scaling, encoding and/or meaning; (b) to reduce the discrepancy between training and production data; and (c) highlight associated risks that might occur in the Product Lifecycle.

15.2.2. Data Definitions and Measurements: Training & Production Data

Define and deploy methods by which to detect the degree to which data sources in Model training and production have the same definitions and measurement scales .

To (a) reduce nonsensical predictions of the Model due to (i) missing data, (ii) lack of data incorporated, or (iii) data measurement scaling, encoding and/or meaning; (b) to reduce the discrepancy between training and production data; and (c) highlight associated risks that might occur in the Product Lifecycle.

15.2.3. Data Dependencies and Upstream Changes

Derive and implement change assessments for changes in data due to - (i) one or multiple internal or external sources (partial) updates, (ii) substantial source change, and/or (iii) changes in data production and/or delivery.

To (a) reduce nonsensical predictions of the Model due to (i) missing data, (ii) lack of data incorporated, or (iii) data measurement scaling, encoding and/or meaning; (b) to reduce the discrepancy between training and production data; and (c) highlight associated risks that might occur in the Product Lifecycle.

15.2.4. Data Drift Detection

Define and deploy monitoring metrics and thresholds for detecting sudden and/or gradual, short term and/or long term changes in data distributions, giving priority to those that can detect past observed changes. (See Section 12.2.1. - Missing and Bad Data Handling for further information). Document and assess distribution families, statistical moments, similarity measures, trends and seasonalities.

To (a) prevent predictions from diverging from training data and/or Product Definitions by assessing whether production data is representative of older data; and (b) highlight associated risks that might occur in the Product Lifecycle.

15.2.5. Product and/or Product Domain Changes: Trends and Preferences

Define and deploy (a) monitoring methods for detecting changes in Product Domain(s) and/or Product Definition(s); and (b) timeframes and/or contextual triggers for reassessment of Product Domain(s) and Product Definition(s) continued stability.

To (a) ensure Models capture accurate, relevant, and current trends and preferences in Product Domain(s); (b) reduce Model 'blind spots' and better capture malicious events/attempts; and (c) highlight associated risks that might occur in the Product Lifecycle.

15.3. Development

Objective
To (a) create metrics for (i) Model performance and (ii) Model performance deterioration; and (b) ensure the continued monitoring of Products and/or Models throughout their lifecycles.
Item nr. Item Name and Page Control Aim
15.3.1. Model Performance Deterioration Thresholds

Document, assess, and set thresholds for Model performance deterioration in consultation with Stakeholders.

To (a) ensure clear guidelines and indices of Model failure and performance deterioration; (b) reduce the risk of unacknowledged Model failure and performance deterioration; (c) reduce the likelihood of Model decay, ensure robustness and good performance in terms of selected metrics and scenarios; and (d) highlight associated risks that might occur in the Product Lifecycle.

15.3.2. Product Contextual Indicators: Model Performance Deterioration

Document, assess, and set Product and Product Domain specific indicators of Model performance deterioration, inclusive of technical and non-technical indicators.

To (a) ensure clear guidelines and indices of Model failure and performance deterioration; (b) reduce the risk of unacknowledged Model failure and performance deterioration; (c) reduce the likelihood of Model decay, ensure robustness and good performance in terms of selected metrics and scenarios; and (d) highlight associated risks that might occur in the Product Lifecycle.

15.3.3. Reactive Model Maintenance Indicators

Document, assess, and set thresholds for Model failure and reactive maintenance

To (a) ensure clear guidelines and indices of Model failure and performance deterioration; (b) reduce the risk of unacknowledged Model failure and performance deterioration; (c) reduce the likelihood of Model decay, ensure robustness and good performance in terms of selected metrics and scenarios; and (d) highlight associated risks that might occur in the Product Lifecycle.

15.3.4. Awareness of feedback loops

Define and deploy as far as is reasonably practical (a) methods to detect whether feedback loops are occuring, and/or (b) technical and non-technical warning indicators for increased risk of the same.

As per Section 17 - Security: to prevent (in)direct adverse social and environmental effects as a consequence of self-reinforcing interactions with the Model(s); and (b) highlight associated risks that might occur in the Product Lifecycle.

15.4. Production

Objective
To (a) identify operational maintenance metrics; and (b) ensure timely update, re-train and re-deployment of Model(s).
Item nr. Item Name and Page Control Aim
15.4.1. Operational Performance Thresholds

Define and set metrics and tolerance intervals for operational performance of Models and Products, such as, amongst other things, latencies, memory size, CPU and GPU usage.

To (a) prevent unavailable and unreliable service; (b) enable quick detection of bugs in the code; (c) ensure smooth integration of the Model with the rest of the systems; and (d) highlight associated risks that might occur in the Product Lifecycle.

15.4.2. Continuous Delivery of Metrics: Model Performance

Continuously report on and record metrics about Model performance, predictions, errors, Features, and associated performance metrics to relevant Stakeholders (as decided upon in Section 13.2. - Representativeness & Specification Exploration and Section 13.3. - Representativeness & Specification Development).

To (a) enable rapid identification of Model decay, and/or red flags and bugs in Model and/or data pipelines; and (b) highlight associated risks that might occur in the Product Lifecycle.

15.4.3. Model Decay & Data Updates

Operationalise procedures to mitigate Data Drift and/or Model decay (as described in Section 14.2 - Performance Robustness Exploration and Section 14.3. - Performance Robustness Development).

To (a) ensure timely implementation of any changes required in data and/or Modelling pipelines; and (b) highlight associated risks that might occur in the Product Lifecycle.

15.4.4. Model Re-training

Operationalise procedures on how Model re-training ought to be conducted as well as approached, inclusive of, amongst other things, - (1) when will (i) a new Model be deployed, and/or (ii) a Model with the same hyperparameters but trained on new data; and/or (2) when operationalizing re-trained Models ought they be run in parallel with older Models and/or do to gracefully decommission older Models.

To (a) ensure timely implementation of any changes required in data and/or Modelling pipelines; and (b) highlight associated risks that might occur in the Product Lifecycle.

15.4.5. Create Contingency Plans

Develop and put in place contingency plans in case of technical failures and out-of-bounds behaviour based on (a) bounds and threshold set in other controls; and (b) risk assessment of failure modes.

To (a) prevent adverse effects from failures and unexpected behaviour by providing clear instructions on roll-back, mitigation and remediation; and (b) highlight associated risks that might occur in the Product Lifecycle.