Anonymised data falls outside the GDPR
That sounds attractive: if you properly anonymise data, it is no longer personal data and you don’t need to comply with the GDPR. But there’s a catch. True anonymisation is harder than most people think.
Anonymisation vs. pseudonymisation
This distinction is crucial and often confused.
Pseudonymisation
You replace identifying data with a code. Customer number 12345 instead of John Smith. But somewhere a table exists that links customer number 12345 to John Smith. As long as that link exists, it’s still personal data.
Pseudonymisation is a good security measure, but it is not anonymisation. The GDPR still applies.
Anonymisation
You remove or modify data such that it is impossible to re-identify the person, even by combining data with other sources. There is no key, no mapping table, no way back.
Anonymised data falls outside the GDPR. You may store and use it without the restrictions of privacy legislation.
Techniques for anonymisation
Generalisation
Replace specific values with broader categories. Instead of “32 years old” you write “30-39 years”. Instead of “Amsterdam” you write “North Holland”.
Suppression
Remove certain fields entirely from the dataset. No name, no address, no date of birth.
Perturbation
Add noise to the data. Change exact values to ranges or add random variation, so individual values are no longer accurate but statistical patterns remain intact.
Aggregation
Present data only as totals or averages. “42 customers from North Holland” instead of individual records.
The pitfalls
Re-identification through combination
Even if you remove names and addresses, a combination of age, postcode, and occupation can often lead to re-identification. Research shows that with three such characteristics, more than 80% of people are uniquely identifiable.
Small datasets
The smaller the dataset, the harder anonymisation becomes. If your dataset contains only 5 customers from a particular city, an “anonymised” record containing the city is still traceable.
The test
Ask yourself: can someone with access to other public or commercial datasets re-identify the persons in my dataset? If the answer is yes or maybe, it’s not truly anonymised.
When to anonymise?
The most common application is retaining data for analysis after the retention period expires. Instead of completely deleting customer data, you anonymise it and keep the statistical value.
Other applications:
- Test environments - use anonymised copies of production data for software testing
- Reports - share trends and statistics without exposing individual data
- Research - analyse patterns without violating individuals’ privacy
GDPRWise helps you set retention periods and track which data you process and when you need to delete or anonymise it.