Security

From The Foundation for Best Practices in Machine Learning
Jump to navigation Jump to search



Hint
To view additional information and to make edit suggestions, click the individual items.

Security

Objective
To (a) prevent adversarial actions against, and encourage graceful failures for, Products and/or Models; (b) avert malicious extraction of Models, data and/or intellectual property; (c) prevent Model based physical and/or irreparable harms; and (d) prevent erosion of trust in Outputs or methods.


17.1. Product Definition(s)

Objective
To identify and control for Adversarial risks and motives based on Product Definition, characterized by adversary goals.
Item nr. Item Name and Page Control Aim
17.1.1. Exfiltration Attacks

Document and assess whether the data employed and gathered by the Product, and the intellectual property generated possess value for potential adversarial actors.

To (a) identify the risks associated with (i) Product Subject physical, financial, social and psychological wellbeing, and (ii) Organization financial wellbeing; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.1.2. Evasion Attacks

Document and assess whether Product Subjects gain advantage from evading and/or manipulating the Product Outputs. Document and assess whether adversarial actors stand to gain advantage in manipulating Product Subject by evading and/or manipulating Product Output.

To (a) identify the risks associated with Product Output manipulation in regard to malicious and nefarious motives; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.1.3. Targeted Sabotage

Document and assess whether adversarial actors can cause harm to specific targeted Product Subjects by manipulating Product Outputs.

To (a) identify the risks associated with targeted Product Subject physical, financial, social and psychological wellbeing; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.1.4. Performance Degradation Attack

Document and assess whether a malicious performance degradation for a specific (Sub)population can cause harm to that (Sub)population. Document and assess whether general performance degradation can cause harm to society, Product Subjects, the Organization, the Domain and/or the field of Machine Learning.

To (a) identify the risks in regard to (i) Product Subjects' physical, financial, social and psychological wellbeing, (ii) the Organization's financial and reputational wellbeing, (iii) society-wide environmental, social and economic wellbeing, and (iv) the Domains' reputational wellbeing; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2. Exploration & Development

Objective
To identify and control for Adversarial Risks based on and originating in Model properties and/or Model data properties.
Item nr. Item Name and Page Control Aim
17.2.1. Data Poisoning Assessment

Document and assess the ease and extent with which adversarial actors may influence training data through manipulating and/or introducing - (i) raw data; (ii) annotation processes; (iii) new data points; (iv) data gathering systems (like sensors); (v) metadata; and/or (vi) multiple components thereof simultaneously. If this constitutes an elevated risk, document, assess and implement measurements that can be taken to detect and/or prevent the above manipulation of training data.

To (a) prevent adversarial actors from seeding susceptibility to Evasion Attacks, Targeted Sabotage and Performance Degradation Attacks by way of (i) introducing hard to detect triggers, (ii) increasing noise, and/or (iii) occluding or otherwise degrading information content; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.2. Public Datasets

Employ public datasets whose characteristics and Error Rates are well known as a benchmark and/or make the Product evaluation results public.

To (a) increase the probability of detection adversarial attacks, such as Data Poisoning, by enabling comparison with and by public resources; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.3. Data Exfiltration Susceptibility

Document and assess the susceptibility of the Model to data Exfiltration Attacks through - (i) the leakage of (parts of) input data through Model Output; (ii) Model memorization of training data that may be exposed through Model output; (iii) the inclusion by design of (some) training data in stored Model artifacts; and/or (iv) repeated querying of the Model.

To (a) warrant and control the risk of Model data theft; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.4. Model Exfiltration Susceptibility

Document and assess the susceptibility of Models to Exfiltration Attacks with the aim of obtaining a copy, or approximation of, the Model or other Organization intellectual property, through repeated querying of the Model and analysing the obtained results and confidence scores.

To (a) warrant and control the risk of Model and intellectual property theft; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.5. Exfiltration Defence

To reduce susceptibility of Exfiltration Attacks, (a) make Exfiltration Attacks computationally expensive; (b) remove as much as possible information from Model Output; (c) add noise to Model Outputs through techniques such as differential privacy; (d) limit querying possibilities in volume and/or scope; and/or (e) change Model architecture.

To (a) warrant and control the risk of Exfiltration Attacks; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.6. Adversarial Input Susceptibility

Document and assess the susceptibility of Models to be effectively influenced by manipulated (inferencing) input. Reduce this susceptibility by (a) increasing the representational robustness (f.e. through more complete embeddings or latent space representation); and/or (b) applying robust transformations (possibly cryptographic) and cleaning.

To (a) warrant the control of the risk of Evasion and Sabotage Attacks, including Adversarial Examples; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.7. Filtering Susceptibility

If sufficient potential motive has been determined for adversarial attack, document and assess the specific susceptibility of the pre-processing filtering procedures of Models being evaded by tailored inputs, based on the information available to an adversarial attacker about these procedures; in addition to the general Susceptibility Assessment. Increase the robustness of this filtering as far as practically feasible.

To (a) warrant the control of the risk of Evasion and Sabotage Attacks, including Adversarial Examples; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.8. Training Susceptibility

If sufficient potential motives have been determined for adversarial attack, document and assess the specific susceptibility of Model training to attack through the manipulation of (a) the partitioning of train, validation and test sets, and/or (b) Models' hyperparameters; in addition to the general Susceptibility Assessment. Implement more strict access control on production-grade training and hyperparameter optimization procedures.

To (a) warrant the control of the risk of Evasion, Sabotage and Performance Degradation Attacks; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.9. Adversarial Example Susceptibility

If sufficient potential motives have been determined for adversarial attack, document and assess the specific susceptibility of Models to Adversarial Examples by considering - (a) sparse or empty regions of the input space, and/or (b) Model architectures; in addition to the general Susceptibility Assessment. Document and implement specific protective measures, such as but not limited to adversarial training.

To (a) warrant the control of the risk of Evasion Attacks, specifically Adversarial Examples; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.10. Adversarial Defence

If sufficient potential motive and susceptibility to adversarial attacks have been determined, implement as far as reasonably practical - (a) data testing methods for detection of outside influence on input and Output Data; (b) reproducibility; (c) increase redundancy by incorporating multimodal input; and/or (d) periodic resets or cleaning of Models and data.

To (a) warrant and control the risk of Adversarial Attacks in general; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.11. General Susceptibility - Information

Document, assess and control the general susceptibility to attack due to information obtainable by attackers, by considering (a) sensitivity to input noise and/or noise as a protective measure; (b) the amount of information an adversarial actor may obtain from over-extensive logging; and/or (c) whether providing confidence scores as Output is beneficial to adversarial actors.

To (a) warrant and control the risk of Adversarial Attacks in general; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.12. General Susceptibility - Exploitability

Document, assess and control the general Model susceptibility to attack due to exploitable properties of Models, considering (a) overfit or highly sensitivity Models and Model hyperparameters are easier to attack; (b) an over-reliance on gradient methods that make Models more predictable and inspectable; (c) Models may be pushed past their applicability boundaries if input is not validated; and (d) non-random random number generators might be replaced by cryptographically secure random number generators.

To (a) warrant and control the risk of Adversarial Attacks in general; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.13. General Susceptibility - Detection

Document, assess and control the capability to detect attacks through the ability to understand when Model behaviour is anomalous by (a) decreasing Model opaqueness, and/or (b) increasing Model robustness.

To (a) warrant and control the risk of Adversarial Attacks in general; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.2.14. Open Source and Transfer Learning Vulnerability

Document the correspondence between potential attack motives and attack susceptibility posed by using, re-using or employing for transfer learning open source Models, Model weights, and/or Model parameters through - (a) maliciously inserted behaviour and/or code ("trojans"), (b) the ability of an adversarial actor to investigate and attack open source Models unhindered; and (c) improper (re-)use. Consider using non-open source Models or making significant changes aimed at reducing susceptibility.

To (a) warrant and control the risk of Adversarial Attacks in general; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.3. Production

Objective
To identify and control for Adversarial Risks based on and/or originating in Models production.
Item nr. Item Name and Page Control Aim
17.3.1. IT Security

Traditional IT security practices are referred to. Areas of particular importance to ML-based systems include - (a) backdoor access to the Product, in particular the components vulnerable to attack risk as identified in other controls; (b) remote host servers vulnerability; (c) hardened and isolated systems; (d) malicious insiders (e)man-in-the-middle attacks; and/or (f) denial-of-service.

To (a) warrant and control the risk of Adversarial Attacks in general; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.3.2. Periodic Review and Validation

If motive and risk for Adversarial Attack is high, perform more stringent and frequent review and validation activities.

To (a) warrant and control the risk of Adversarial Attacks in general by increasing detection probability and fixing vulnerabilities quickly; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.3.3. Input and Output Vulnerability

Document and assess the vulnerability of the Product and related systems to direct manipulation of inputs and Outputs. Direct Output manipulation if possible is the most straightforward, simplest, cheapest and hardest to detect attack

To (a) create redundancy with input and inferencing hyperparameter susceptibility; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.3.4. Defense Strength Assessment

Document and assess the vulnerability of the Product and related systems to direct manipulation of inputs and Outputs. Direct Output manipulation if possible is the most straightforward, simplest, cheapest and hardest to detect attack

To (a) create redundancy with input and inferencing hyperparameter susceptibility; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.4. Confidence & Trust

Objective
To identify and control for Adversarial Risks based on and/or originating in Product trust and confidence.
Item nr. Item Name and Page Control Aim
17.4.1. Trust Erosion

Document and assess the potential impact on trust from adversarial and defacement attacks, and establish a strategy to mitigate trust erosion in case of successful attacks.

To (a) prevent erosion of trust in Product Outputs, the Product, the Organization, and/or Domains from preventing beneficial Products and technologies to be employed; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.4.2. Confidence

Document and assess the degree of over- and under-confidence in the Product output by Product Team, Stakeholder(s) and End Users. Encourage an appropriate level of confidence through education and self-reflection. Note: Underconfidence will lead to users over-ruling the Product in unexpected ways. Overconfidence leads to lower scrutiny and therefore lowers the chance of detection and prevention of attacks.

To (a) balance the risk of compromising Product effects against reduced vigilance; and (b) highlight associated risks that might occur in the Product Lifecycle.

17.4.3. Warning Fatigue

Document and assess the frequency of warnings and alerts provided to Product operators, maintainers, and Product Subjects, and refine the thresholds and processes such that no over-exposure to alerts can lead to systematic ignoring of alerts.

To (a) prevent an overexposure to alerts that can lead to ignoring serious defects and incidents, causing harm; and (b) highlight associated risks that might occur in the Product Lifecycle.