Performance Robustness
Performance Robustness
14.1. Product Definition(s)
Item nr. | Item Name and Page | Control | Aim |
---|---|---|---|
14.1.1. | Product Definition(s) Stability |
Document and assess the stability of historic and prospective Product Definition(s) and Product Aim(s). If unstable, take measures to redefine or, failing that, to correct for or mitigate as much as is reasonably practical. |
To (a) ensure that Product Definition(s) and Models remain stable and up-to-date in light of Product Domain Stability; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.1.2. | Product Domain Stability |
Document and assess the stability of historic and prospective Product Domain(s). If unstable, revise Product Definition(s) accordingly to ensure Product consistency and stability. |
To (a) ensure that Product Definition(s) and Models remain stable and up-to-date in light of Product Domain Stability; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.2. Exploration
Item nr. | Item Name and Page | Control | Aim |
---|---|---|---|
14.2.1. | Data Drift Assessment |
Document and assess historic and prospective changes in data distribution, inclusive of missing and nonsensical data. If data drift is apparent and/or expected in the future, implement mitigating measures as much as is reasonably practical. |
To (a) assess and promote the stability of data distributions (data drift); (b) determine the need for data distributions monitoring, risk-based mitigation strategies and responses, drift resistance and adaptation simulations and optimization, and data distribution calibration; and (c) highlight associated risks that might occur in the Product Lifecycle. |
14.2.2. | Data Definition Temporal Stability |
Document and assess - both technically and conceptually - historic and prospective changes of each data dimension definition. If unstable, consider refining Product Definitions and/or limiting usage of unstable data dimensions. |
To (a) assess and control for the need for Model design adaptation based on data definition stability; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.2.3. | Outlier Occurrence Rates |
Document and assess outliers, their causes, and occurrence rates as a function of their location in data space. If numerous and persistent, include mitigating measures in Model design accordingly. |
To (a) identify outliers and assess the need for Model design adaptation; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.2.4. | Selection Function Temporal Stability |
Document and assess the historic and prospective behaviour of Selection Function(s) of Model data. (See Section 13.2.4. - Selection Function for more information.) If unstable, take measures to account for past and future changes, and/or promote the consistency and representativeness of Model datasets and data gathering as much as is reasonably practical. |
To (a) assess and control for hard-to-measure changes to the relation between Model datasets and Product Domain(s); (b) identify the risk of hard-to-diagnose Model performance degradation and bias throughout Product Lifecycle (to be controlled by 14.3.6. - Model Drift & Model Robustness Simulations); and (c) highlight associated risks that might occur in the Product Lifecycle. |
14.2.5. | Data Generating Process Temporal Stability |
Document and assess the historic and prospective behaviour of data generating processes, and their influence on the Selection Function. If unstable, take measures to account for past and future changes and/or promote the stability and consistency of data generation processes as much as is reasonably practical. |
To (a) assess and control for hard-to-measure changes to the relation between Model datasets and Product Domain(s); (b) identify the risk of hard-to-diagnose Model performance degradation and bias throughout Product Lifecycle (to be controlled by 14.3.6. - Model Drift & Model Robustness Simulations); and (c) highlight associated risks that might occur in the Product Lifecycle |
14.3. Development
Item nr. | Item Name and Page | Control | Aim |
---|---|---|---|
14.3.1. | Target Feature Definition Stability |
Document and assess - both technically and conceptually - the historic and prospective stability of the Target Feature definition. If unstable, consider refining Product Definitions and/or choosing a different Target Feature. |
To (a) assess the need for Model design and Product Definition adaptation based on Target Feature definition stability; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.3.2. | Blind Performance Validation |
Document and validate that Model Performance can always be reproduced on never-before-seen hold-out data-subsets and prove that these hold-out data-subsets are never used to guide Model and Product design choices by comparing Model performance on the hold-out dataset. If performance cannot be reproduced on never-before-seen hold-out data-subset, take measures to improve robustness and Model fitting as much as is reasonably practical. |
To (a) ensure Model performance robustness against insufficient generalization capabilities on live data (such as overfitting); and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.3.3. | Error Distributions |
Document and assess error and/or residual distributions along as many dimensions and/or subsets as is practically feasible. If distributions are too broad and/or too unequal between subsets, improve Model(s). |
To (a) assess and control for performance influence of data points and/or groups; (b) assess and control for the distribution of errors to influence - (i) performance robustness as a function of data drift, (ii) the systematic performance of minority data-subsets, and (iii) the risks of unacceptable errors and/or catastrophic failure; and (c) highlight associated risks that might occur in the Product Lifecycle. |
14.3.4. | Output Edge Cases |
Document and assess the causes, occurrence probabilities, overall performance impact of Edge Cases output by Model(s), inclusive of on Model training and design. If their influence is significant, improve model design. If occurrence is high, increase Model, code and data quality control. |
To (a) assess and control for the impact of Output Edge Cases on Model design, bugs and performance; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.3.5. | Performance Root Cause Analysis |
Document and assess Model performance Root Cause Analysis as well as its testing method. If Root Cause Analysis is ineffective, simplify Model and/or increase diagnostics like logging and tracking. |
To (a) assess and control for Model performance changes and assist in Model design, development, and debugging; (b) highlight associated risks that might occur in the Product Lifecycle. |
14.3.6. | Model Drift & Model Robustness Simulations |
Document and perform simulations of Model training and retraining cycles, using historic and synthetic data. Document and assess the effects of temporal changes to, amongst other things, the Selection Function, Data Generating Process and Data Drift on the drift in performance and error distributions of said simulations. If Model drift is apparent, document and perform further simulations for Model drift response optimization, and/or consider refining Product Definitions. |
To (a) assess and control for Model propensity for Model drift; (b) determine the robustness of Model performance as a function of data changes; (c) determine appropriate Product response to drift; and (d) highlight associated risks that might occur in the Product Lifecycle. |
14.3.7. | Catastrophic Failures |
Document and assess the prevalence of predictions with High Confidence Values, but large Evaluation Errors. If apparent, improve Model to avoid these, and/or implement processes to mitigate these as much as is reasonably practical. |
To (a) assess the propensity of the Model for catastrophic failures; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.3.8. | Performance Uncertainty and Sensitivity Analysis |
Document and assess the probability distribution of the model performance using cross-validation, statistical and simulation techniques under - (a) the assumption that the distribution of training and validation data is representative of the distribution of live data; and (b) multiple realistic variations to the Model data due to both statistical and contextual causes. If Model performance variation is high, improve Model and/or take measures to mitigate performance variation impact. |
To (a) assess and control for the range of expected values of Model performance under both constant and changing conditions; (b) assess and control for whether trained model performance is consistent with these ranges; (c) identify main sources of uncertainty and variation for further control; and (d) highlight associated risks that might occur in the Product Lifecycle. |
14.3.9. | Outlier Handling |
Document and assess the effect of various outlier handling procedures on (a) Performance Robustness and (b) Representativeness & Specification. Ensure that only procedures are implemented that positively affect both. |
To (a) ensure that outlier removal is not used to heedlessly improve test-time performance only and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.4. Production
Item nr. | Item Name and Page | Control | Aim |
---|---|---|---|
14.4.1. | Real World Robustness |
Document and assess potential future change in the applied effects of the Product, such as through diminishing returns and/or psychological effects. If significant change or decrease is expected, consider refining Product Definitions and/or develop procedures for mitigation. |
To (a) assess and control for the variation in applied effects of the Product on Product Definition(s) and performance; and (b) highlight associated risks that might occur in the Product Lifecycle. |
14.4.2. | Performance Stress Testing |
Perform and document experiments designed to attempt to induce failures in the Product and/or Model, for example, but not limited to, by supplying large quantities of or unusual data to the training or inferencing phases. |
To (a) identify and control for risks associated with operational scenario's outside of regimes encountered during Model development. |