Definitions
Jump to navigation
Jump to search
Introduction
As used in this Best Practice Guideline, the following terms shall have the following meanings where capitalised. All references to the singular shall include references to the plural, where applicable, and vice versa. Any terms not defined or capitalised in this Best Practice Guideline shall hold their plain text meaning as cited in English and data science.
Dictionary
- Absolute Reproducibility
- a guarantee that any and all results, outputs, outcomes, artifacts, etc can be exactly reproduced under any circumstances.
- Best Practice Guideline
- this document.
- Confidence Value
- a measure of a Model’s self-reported certainty that the given Output is correct.
- Data Generating Process
- the process, through physical and digital : means, by which Records of data are created (usually representing events, objects or persons). The salient point being that any DGP has idiosyncrasies, assumptions, limitations etc that cause the created datasets to share these limitations. The DGP is a primary influence on the Selection Function.
- Data Science
- an interdisciplinary field that uses scientific methods, processes, algorithms and computational systems to extract knowledge and insights from structural and/or unstructured data.
- Domain(s)
- the societal and/or commercial environment within which the Product will be and/or is operationalised.
- Edge Case
- an outlier in the space of both input Features and Model Output(s).
- Error Rate
- the frequency of occurrence of errors in the (Sub)population relative to the size of the (Sub)population.
- Evaluation Error
- the difference between the ground truth and a Model’s prediction or output.
- Fairness & Non-Discrimination
- the property of Model(s) and Model outcome(s) to be free from bias against Protected Classes.
- Features
- the different attributes of datapoint(s) as recorded in the data.
- Hidden Variable
- an attribute of a datapoint or an attribute of a system that has a causal relation to other attributes, but is itself not measured or unmeasurable.
- Human-Centric Design & Redress
- orienting Product(s) and/or Model(s) to focus on humans and their environments through promoting human and/or environment centric value(s) and recourse(s) for redress.
- Implementation
- every aspect of the Product and Model(s) insertion of and/or application to Organisation systems, infrastructure, processes and culture and Domain(s) and Society ].
- Incident
- the occurrence of a technical event that affects the integrity of a Product and/or Model.
- Label
- the Feature that represents the (supposed) ground-truth values corresponding to the Target Variable.
- Machine Learning
- the use and development of computer systems and Model(s) that are able to learn and adapt with minimal explicit human instructions by using algorithms and statistical modelling to analyse, draw inferences, and derive output(s) from data.
- Model
- Machine Learning algorithm(s) and data processing designed, developed, trained and implemented to achieve set output(s), inclusive of datasets used for said purposes unless otherwise stated.
- Organisation
- the concerned juristic entity designing, developing and/or implementing Machine Learning.
- Outcome
- the resultant effect of applying Model(s) and/or Product(s).
- Output
- that which Model(s) produce, typically (but not exclusively) predictions or decisions.
- Performance Robustness
- the propensity of Product(s) and/or Model(s) to retain their desired performance over diverse and wide operational conditions.
- Product
- the collective and broad process of design, development, implementation and operationalisation of Model(s), and associated processes, to execute and achieve Product Definition(s), inclusive of, amongst other things, the integration of such operations and/or Model(s) into organisation product(s), software and/or system(s).
- Product Manager
- either a Design Owner and/or Run Owner as identified in the Organisation Best Practice Guideline in Sections 3.1.4. & 3.1.7. respectively.
- Product Team
- the collective group of Organisation employees directly charged with designing, developing and/or implementing the Product.
- Product Subjects
- the entities and/or objects that are represented as data points in dataset(s) and/or Model(s), and who may be the subject of Product and/or Model outcome(s).
- Project Lifecycle
- the collective phases of Product(s) from initiation to termination - such as design, exploration, experimentation, development, implementation, operationalisation, and decommissioning - and their mutual iterations.
- Protected Classes
- (Sub)populations of Product Subjects, typically persons, that are protected by law, regulation, policy or based on Product Definition(s)
- Root Cause Analysis
- the activity and/or report of the investigation into the primary causal reasons for the existence of some behaviour (usually an error or deviation).
- Safety
- real Product Domain based physical harms that result through Product(s) and/or Model(s) applications.
- Security
- the resilience of Product(s) and/or Model(s) against malicious and/or negligent activities that result in Organisational loss of control over concerned Product(s) and/or Model(s).
- Selection Function
- a (where possible mathematical) description of the probability or proportion of all real Subjects that might potentially be recorded in the dataset that are actually recorded in a dataset.
- Stakeholders
- the department(s) and/or team(s) within the Organisation who do not conduct data science and/or technical Machine Learning, but have a material interest in Product(s) Machine Learning.
- (Sub)population
- any group of persons, animal(s), or any other entities represented by a piece of data , that is part of a larger (potential) dataset and characterized by any (combination of) attributes. The importance of (Sub)populations is particularly high when some (Sub)populations are vulnerable or protected (Protected Classes).
- Systemic Stability
- the stability of Organisation, Domain, society and environment(s) as a collective ecosystem.
- Target of Interest
- the fundamental concept that the Product is truly interested in when all is said and done, even if it is something that is not (objectively) measureable.
- Target Variable
- the Variable which a Model is made to predict and/or output.
- Traceability
- the ability to trace, recount, and reproduce Product outcome(s), report(s), intermediate product(s), and other artifact(s), inclusive of Model(s), dataset(s) and codebase(s).
- Variables
- mean the different attributes of subjects or systems which may or may not be measured.