Definitions

From The Foundation for Best Practices in Machine Learning
Revision as of 10:41, 28 March 2021 by Lorrie27 (talk | contribs) (Created page with "== Introduction == As used in this Best Practice Guideline, the following terms shall have the following meanings where capitalised. All references to the singular shall incl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

As used in this Best Practice Guideline, the following terms shall have the following meanings where capitalised. All references to the singular shall include references to the plural, where applicable, and vice versa. Any terms not defined or capitalised in this Best Practice Guideline shall hold their plain text meaning as cited in English and data science.


Dictionary

Absolute Reproducibility
a guarantee that any and all results, outputs, outcomes, artifacts, etc can be exactly reproduced under any circumstances.
Best Practice Guideline
this document.
Confidence Value
a measure of a Model’s self-reported certainty that the given Output is correct.
Data Generating Process
the process, through physical and digital : means, by which Records of data are created (usually representing events, objects or persons). The salient point being that any DGP has idiosyncrasies, assumptions, limitations etc that cause the created datasets to share these limitations. The DGP is a primary influence on the Selection Function.
Data Science
an interdisciplinary field that uses scientific methods, processes, algorithms and computational systems to extract knowledge and insights from structural and/or unstructured data.
Domain(s)
the societal and/or commercial environment within which the Product will be and/or is operationalised.
Edge Case
an outlier in the space of both input Features and Model Output(s).
Error Rate
the frequency of occurrence of errors in the (Sub)population relative to the size of the (Sub)population.
Evaluation Error
the difference between the ground truth and a Model’s prediction or output.
Fairness & Non-Discrimination
the property of Model(s) and Model outcome(s) to be free from bias against Protected Classes.
Features
the different attributes of datapoint(s) as recorded in the data.
Hidden Variable
an attribute of a datapoint or an attribute of a system that has a causal relation to other attributes, but is itself not measured or unmeasurable.
Human-Centric Design & Redress
orienting Product(s) and/or Model(s) to focus on humans and their environments through promoting human and/or environment centric value(s) and recourse(s) for redress.
Implementation
every aspect of the Product and Model(s) insertion of and/or application to Organisation systems, infrastructure, processes and culture and Domain(s) and Society ].
Incident
the occurrence of a technical event that affects the integrity of a Product and/or Model.
Label
the Feature that represents the (supposed) ground-truth values corresponding to the Target Variable.
Machine Learning
the use and development of computer systems and Model(s) that are able to learn and adapt with minimal explicit human instructions by using algorithms and statistical modelling to analyse, draw inferences, and derive output(s) from data.
Model
Machine Learning algorithm(s) and data processing designed, developed, trained and implemented to achieve set output(s), inclusive of datasets used for said purposes unless otherwise stated.
Organisation
the concerned juristic entity designing, developing and/or implementing Machine Learning.
Outcome
the resultant effect of applying Model(s) and/or Product(s).
Output
that which Model(s) produce, typically (but not exclusively) predictions or decisions.
Performance Robustness
the propensity of Product(s) and/or Model(s) to retain their desired performance over diverse and wide operational conditions.
Product
the collective and broad process of design, development, implementation and operationalisation of Model(s), and associated processes, to execute and achieve Product Definition(s), inclusive of, amongst other things, the integration of such operations and/or Model(s) into organisation product(s), software and/or system(s).
Product Manager
either a Design Owner and/or Run Owner as identified in the Organisation Best Practice Guideline in Sections 3.1.4. & 3.1.7. respectively.
Product Team
the collective group of Organisation employees directly charged with designing, developing and/or implementing the Product.
Product Subjects
the entities and/or objects that are represented as data points in dataset(s) and/or Model(s), and who may be the subject of Product and/or Model outcome(s).
Project Lifecycle
the collective phases of Product(s) from initiation to termination - such as design, exploration, experimentation, development, implementation, operationalisation, and decommissioning - and their mutual iterations.
Protected Classes
(Sub)populations of Product Subjects, typically persons, that are protected by law, regulation, policy or based on Product Definition(s)
Root Cause Analysis
the activity and/or report of the investigation into the primary causal reasons for the existence of some behaviour (usually an error or deviation).
Safety
real Product Domain based physical harms that result through Product(s) and/or Model(s) applications.
Security
the resilience of Product(s) and/or Model(s) against malicious and/or negligent activities that result in Organisational loss of control over concerned Product(s) and/or Model(s).
Selection Function
a (where possible mathematical) description of the probability or proportion of all real Subjects that might potentially be recorded in the dataset that are actually recorded in a dataset.
Stakeholders
the department(s) and/or team(s) within the Organisation who do not conduct data science and/or technical Machine Learning, but have a material interest in Product(s) Machine Learning.
(Sub)population
any group of persons, animal(s), or any other entities represented by a piece of data , that is part of a larger (potential) dataset and characterized by any (combination of) attributes. The importance of (Sub)populations is particularly high when some (Sub)populations are vulnerable or protected (Protected Classes).
Systemic Stability
the stability of Organisation, Domain, society and environment(s) as a collective ecosystem.
Target of Interest
the fundamental concept that the Product is truly interested in when all is said and done, even if it is something that is not (objectively) measureable.
Target Variable
the Variable which a Model is made to predict and/or output.
Traceability
the ability to trace, recount, and reproduce Product outcome(s), report(s), intermediate product(s), and other artifact(s), inclusive of Model(s), dataset(s) and codebase(s).
Variables
mean the different attributes of subjects or systems which may or may not be measured.