Representativeness & Specification

From The Foundation for Best Practices in Machine Learning
Technical Best Practices > Representativeness & Specification
Jump to navigation Jump to search


Hint
To view additional information and to make edit suggestions, click the individual items.

Representativeness & Specification

Objective
To (a) ensure that Product data and Model(s) are representative of, and accurately specified for, Product Domain as far as is reasonably practical; and (b) guard against unintentional Product and Model behaviour and Outcomes as far as is reasonably practical.


13.1. Product Definition(s)

Objective
To (a) ensure the pragmatic formulation and accurate specification of Product Definition(s); (b) minimise Model simplifications, assumptions and ambiguities; and (c) ensure adequate vigil of the non-reducible ones throughout the Product Lifecycle.
Item nr. Item Name and Page Control Aim
13.1.1. R&S Product Definition(s) Assessment

Document and assess whether recorded Product Definition(s) are complete, unambiguous and representative of intended Product Outcomes. If they are not, refine them as much as is reasonably practical.

To (a) enable reliable execution of all further research, development and assessments; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.1.2. Product Assumptions

Document and assess Product assumptions, the likelihood of their appropriateness, their continued validity, and inherent risks.

To (a) detect, mitigate and review Product assumptions and their inherent risks; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.1.3. Product Simplifications

Document and assess Product simplifications, the likelihood of their appropriateness, and their inherent risks.

To (a) detect, mitigate and review Product simplifications and their inherent risks; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.1.4. Product Limits

Document and assess the limitations of the Product's application and the applicability of Product Definitions.

To (a) detect and review Model limitations in light of (i) Model assumptions and (ii) Model simplifications; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.1.5. R&S Problem Definition Review

R&S Product Definition(s) ought to be reviewed continually, specifically when significant Model changes occur.

To ensure that R&S Product Definition(s) are kept up-to-date to ensure their continued effectiveness, suitability, and accuracy; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.2. Exploration

Objective
To (a) ensure that Model dataset(s) correspond to the Product Definition in sufficient detail, completeness and without material unambiguity; and (b) to identify associated risks in order to ensure an adequate vigil throughout the Product Lifecycle.
Item nr. Item Name and Page Control Aim
13.2.1. Data Subjectivity

Document and assess whether the Model dataset(s) contain subjective components. If subjective components are present, take measures to handle or avoid subjectivity risks in Product and/or Model design as much as is reasonably practical.

To (a) assess and control for the accuracy of the specification of Model inputs, manipulations, Outcomes, and interpretations to ensure the unambiguous applicability of Model(s) in Product Domain(s); and (b) highlight associated risks that might occur in the Product Lifecycle.

13.2.2. Heterogeneous Variable Simplification

Document and assess whether Model datasets contain, or Model components produce, simplified input Features that represent inherently heterogeneous concepts in Product Domains. If simplified, take measures to reflect the heterogeneity of Product Domains as much as is reasonably practical.

To (a) detect, review and control for the simplification of heterogeneous input Variables; (b) prevent generalization and spurious correlation; and (c) highlight associated risks that might occur in the Product Lifecycle.

13.2.3. Hidden Variables

Document and assess whether Model datasets are missing, or Model components hide relevant attributes of Product Subjects or systemic Variables with respect to Product Domains. If hidden, obtain additional data and/or account for the hidden Variables in modelling as much as is reasonably practical.

To (a) assess and control for hidden correlations and causal relations in Model datasets and Variables and/or risks of relations being spurious, ambiguous and/or confounding; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.2.4. Selection Function

Document and assess the propensity of subpopulations and subpopulation members to be (accurately) recorded in Model datasets, with particular care for (i) unrecorded individuals, (ii) Protected Classes, and (iii) survivorship effects. Incorporate the Selection Function in Model development and evaluation in particular during Fairness & Non-Discrimination, Performance Robustness controls.

To (a) assess and control for the accuracy of Model and Model datasets in representing (Sub)populations; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.2.5. Feature Constraints

Evaluate whether any constraints should be applied to input Features, such as monotonicity or constraints on input Feature interactions in consultation with Domain experts. If determined, utilise identified constraints.

To (a) ensure that (i) Model Outcomes are maximally interpretable and (ii) Model behavior for individual Model Subjects is consistent with Domain experience; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.3. Development

Objective
To (a) ensure that Model design is sufficiently specified to represent Product Domain(s) and the Product Definition(s) as much as is reasonably practical; and (b) minimise the risks of (i) adverse effects from the Model's optimisation leading to unintended loopholes and local optima, and (ii) mis-balancing competing optimisation requirements in Model design and development.
Item nr. Item Name and Page Control Aim
13.3.1. Target Subjectivity

Document and assess whether the Target Feature(s) objectively represent Product Domain(s). If subjective, consider refining Product Definition(s), choosing a different Target Feature, or taking measures to promote the objectivity of Product Outcomes.

To (a) ensure that Product Outcomes are representative of subpopulations and applications, and are not misinterpreted; (b) ensure that Models are optimized only and precisely according to Product Definitions; and (c) highlight associated risks that might occur in the Product Lifecycle.

13.3.2. Target Proxies

Document and assess whether the Target Feature(s) are proxies for the true Target(s) of Interest in Product Domain(s). If Target Features are proxies, take measures to ensure and review non-divergence of Product Outcomes with regard to Product Definitions.

To (a) ensure that Product Outcomes are representative of subpopulations and applications, and are not misinterpreted; (b) ensure that Models are optimized only and precisely according to Product Definitions; and (c) highlight associated risks that might occur in the Product Lifecycle.

13.3.3. Target Proxy vs. True Target of Interest Contrasting

If the Target Feature is a proxy (i) document and assess whether the true Target(s) of Interest correlate with protected attributes and classes, including through hidden systemic Variables as much as is reasonably practical; and (ii) document and assess whether the true Target(s) of Interest and the proxy Target Feature(s) correlate differently with the Model datasets. If true, take measures to mitigate this as much as is reasonably practical.

To (a) ensure that the Model design is oriented to the true Target(s) of Interest; and (b) highlight associated risks that might occur in the lack thereof in the Product Lifecycle.

13.3.4. Heterogeneous Target Variable Simplification

Document and assess whether the Target Feature is a simplification of, or contains a subset of, true Target(s) of Interest. If true, consider refining Product Definitions, recovering the heterogeneity, or failing that, take measures to mitigate and review this as much as is reasonably practical.

Idem Section 11.3.1-2; and to (a) detect and control for risks of generalization and spurious correlation creation.

13.3.5. Cost Function Specification & Optimisation

Document and assess the risk propensity for - (i) optimizing for subset of objectives to the detriment of other Product objectives, (ii) optimizing for Outcomes that are unintended and/or not aligned with any Product objectives, (iii) feedback loops (when containing nested optimization loops), and (iv) Model confinement in adverse or less-than-optimal parameter or solution space - through Model cost function and optimisation procedures during the Product Lifecycle. If risks occur, take measures to mitigate them as much as is reasonably practical.

To (a) ensure the adequate optimisation of Product Definitions through an assessment of the cost function and optimization procedure; (b) to respect the boundary conditions and requirements set by the Product Definitions; and (c) highlight associated risks that might occur in the Product Lifecycle.

13.3.6. Importance Weighting

Document and assess whether Model data points are weighted by design or as collateral effect.

To (a) ensure the adequate optimisation of Product Definitions through an assessment of the cost function and optimization procedure; (b) to respect the boundary conditions and requirements set by the Product Definitions; and (c) highlight associated risks that might occur in the Product Lifecycle

13.3.7. Asymmetric Error Weights

Document and assess whether Model errors, and error rates, are weighted asymmetrically in the Model.

To (a) ensure the adequate optimisation of Product Definitions through an assessment of the cost function and optimization procedure; (b) to respect the boundary conditions and requirements set by the Product Definitions; and (c) highlight associated risks that might occur in the Product Lifecycle

13.3.8. Feature Weighting

Document and assess whether Model Features are weighted by design or as collateral effect.

To (a) ensure the adequate optimisation of Product Definitions through an assessment of the cost function and optimization procedure; (b) to respect the boundary conditions and requirements set by the Product Definitions; and (c) highlight associated risks that might occur in the Product Lifecycle

13.3.9. Output Interpretation(s)

Document and assess whether the interpretation of the Model Outcomes are clearly, completely and unambiguously defined. If not, take measures to promote Outcome interpretation(s) clarity and completeness as much as is reasonably practical.

To (a) guard against the misinterpretation and/or misapplication of Model Outcomes; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.3.10. Time-dependent Data Modeling

Document and assess whether all time-dependent aspects of data generation (including but not limited to gathering, calibration, cleaning, and annotation), data modeling and data usage are specified and incorporated in Model design and Product Definition(s).

To (a) prevent data leakage and other forms of "time traveling" information leading to inaccurate representations of the data and/or Data Subjects; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.4. Production

Objective
To ensure that the Implementation of the Product and Model(s) align with and represent Product Definition(s) and Product Domain(s).
Item nr. Item Name and Page Control Aim
13.4.1. Asymmetric Error Costs

Document and assess whether Product Domain(s) costs produced by different Model errors types are accounted for in Product implementation and application in software and processes. If not, take measures to ensure that they are.

To (a) ensure that Product Domain(s) and Product Subjects consequences are accurately considered when implementing Product outcomess; and (b) highlight associated risks that might occur in the Product Lifecycle.

13.4.2. Output Interpretation(s) - Production

Document and assess whether Product Outcomes can be clearly, completely and unambiguously interpreted by the non-technical parties and whether these interpretations remain representative of Product Definition(s) and Model inner workings. If not, take measures to ensure that they are as much as is reasonably practical.

To (a) prevent (i) misinterpretation of Product Outcomes, (ii) the application of Products in contexts and/or to Subjects for which their appropriateness and/or quality is unconfirmed, unknown, and/or unsatisfactory, (iii) the intentional and/or unintentional misuse of Product components and Outcomes; and (b) highlight associated risks that might occur in the Product Lifecycle.