Missing and Bad Data Handling

From The Foundation for Best Practices in Machine Learning
Technical Best Practices > Data Quality > Missing and Bad Data Handling

Missing and Bad Data Handling

Control

Document and assess how missing and nonsensical data (a) are handled in the Model, through datapoint exclusion or data imputation; (b) affect the Selection Function through datapoint removal; (c) affect Model performance and Fairness for subpopulations through data imputation. If (Sub)populations are unequally affected, take additional measures to increase data quality and/or improve Model resilience. Consult Domain experts during assessment and mitigation.


Aim

To (a) prevent introducing bias to Model Outcomes due to low quality data; and (b) highlight associated risks that might occur in the Product Lifecycle.


Additional Information