Driving AI in a Privacy Protection World
- Chuanjie Wu
- Apr 24, 2022
- 3 min read
Updated: Apr 26, 2022
Apple ATT
In March 2021, Apple started to enforce a new privacy policy named App Tracking Transparency (ATT) on the iOS 14.5, iPadOS 14.5, and tvOS 14.5 platforms, which could be a milestone of stepping into the privacy-first mobile internet era.
> With iOS 14.5, iPadOS 14.5, and tvOS 14.5 and later, you need to receive the user’s permission through the AppTrackingTransparency framework in order to track them or access their device’s advertising identifier. Tracking refers to the act of linking user or device data collected from your app with user or device data collected from other companies’ apps, websites, or offline properties for targeted advertising or advertising measurement purposes. Tracking also refers to sharing user or device data with data brokers.
As described in Apple’s User Privacy and Data Use instruction, this major privacy change will impact the entire digital marketing industry, especially for those online advertising providers, such as Google, Facebook, Pinterest, etc. More specifically, the AT framework will reduce identify targeting, attribution measurement, etc those important and critical steps among the current digital advertising process (Apple Is Changing How Digital Ads Work. Are Advertisers Prepared?)
Federated Learning
Nowadays, lots of privacy-focused approaches/techniques have risen among many tech giants and have been heavily researched and invested. Federated learning is one of the most promising techniques for the last few years now. It’s first introduced by Google in 2016 in Communication Efficient Learning of Deep Networks for Decentralized Data, as well as Deep Learning with Differential Privacy. FL is an approach that enables mobile devices to collaboratively train machine learning models while keeping the raw training data on each user’s device. It resolves the issues of centralized data centers which could be user data sensitive and easy to be attacked. In FL, there are mainly four steps in the overall workflow cycle (Privacy Preservation in Federated Learning: An insightful survey from the GDPR Perspective),
Participant selection and global model dissemination: The server selects a set of participants that satisfy the requirements to be involved in the training process. It then broadcasts a global ML model (or the global model updates) to the participants for the next training round.
Local Computation: Once receiving the global ML model from the server, the participant updates its current local ML model and then trains the updated model using the local dataset residing in the device.
Local Models Aggregation: The server aggregates a sufficient number of the locally trained ML models from participants in order to update the global ML model.
Global Model Update: The server performs an update on the current global ML model based on the aggregated model parameters obtained in step 3.
This 4-step cycle is repeated until the global model has reached sufficient accuracy. Recently, Google AI has announced the launch of a production machine learning model trained with a rigorous and publicly stated differential privacy guarantee in a federated setting, implying that raw data was never collected from devices, adhering to the principle of data minimization Federated Learning with Formal Differential Privacy Guarantees.
Other Privacy-Preserving Techniques in ML
Data anonymization is a technique to remove or hash sensitive attributes, such as personally identifiable information (PII), so that a data subject cannot be identified in the training dataset.
Differential privacy: is an advanced solution of the perturbation privacy-preserving technique in which random noise is added to true outputs using rigorous mathematical measures.
Multi-party computation (MPC): its catalyst is that a function can be collectively computed over a dataset owned by multiple parties using their own inputs so that any party learns nothing about others’ data except the outputs.
Challenges and outlook
Many non-privacy-aware ML algorithms are still being widely used today, which means that there are still lots of potential opportunities that privacy-preserving ML can be leveraged. Meanwhile, we could observe several issues and challenges during the development of PPML.
Flexibility. Many PPML techniques are tied to certain scenarios or algorithms, and it’s hard to extend to more general settings, which has limited the ability of privacy-preserving.
Scalability. In the FL example, the processing and communication costs are non-trivial at the current scale. We have to consider the algorithm efficiency when fully launching them into the prod.
Comentários