Data Buy v. Build Analysis

From The Foundation for Best Practices in Machine Learning

Data Buy v. Build Analysis

Control

The Product Team should work cross-functionally with relevant Stakeholders to define and document a Buy v. Build checklist that considers the following areas: (a) Does the Organisation have enough data for every stage of the process (training, POC, production) and for every purpose (replacing stale/flawed data, measuring success); (b) Does the Organisation have the right type of data for every stage of the process (training, POC, production) and for every purpose (replacing stale/flawed data, measuring success); (c) Is bought data secure and free of privacy concerns; (d) Is the bias in the bought data limited, mitigatable, or removable; (e) Given the results of the Data Quality Analysis, does the Organisation have quality data and are datasets complete; (f) Given the Product Team Composition, does the Organisation have the staffing and expertise to clean, prepare, and maintain internal data; and/or (g) Given the Data Capacity Analysis, is the necessary data easily and readily available internally.


Aim

To (a) ensure that the Organisation's decision to either purchase data or utilize in-house data is appropriate based on Organisation capacity and/or constraints; and (b) highlight associated risks that might occur in the Product Lifecycle.


Additional Information