Data must be anonymised or pseudonymised prior to deposit with ISSDA. Anonymisation or pseudonymisation is the responsibility of the Data Depositor.
Definitions -
Personal Data as defined under Article 4(1) of GDPR “Any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”.
Personal data can be disclosed through two types of identifiers -
Direct identifiers - name, address, telephone number, IP address, email, student ID, PPS no.
Indirect identifiers - information that in combination with other information could identify individuals examples - sex, gender, age, region, occupation, work place, status in employment, economic activity, occupation status, income, ethnicity, religious affiliation, socio-economic status, marital status, household composition, education level, nationality, mother tongue, rare diseases, etc.
Anonymisation of data as defined by the Data Protection Commission "means processing it with the aim of irreversibly preventing the identification of the individual to whom it relates. Data can be considered effectively and sufficiently anonymised if it does not relate to an identified or identifiable natural person or where it has been rendered anonymous in such a manner that the data subject is not or no longer identifiable."
Pseudonymisation under Article 4(5) of GDPR ‘means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person’.
Anonymous data is not considered personal data. Pseudonymised data, however, is considered personal data.
Further guidance on anoymisation and pseudonymisation is available from the (opens in a new window)Data Protection Commission
Prior to depositing data with ISSDA:
- Remove all direct identifiers
- name, address or detailed geographic location including postal code, date of birth, telephone number, IP address, email, student ID, PPS no, passport no. etc.
- Check all indirect identifiers to ensure there are no outliers and that identifiers cannot be combined to identify an individual. Where there are outliers or low numbers of observations it may be necessary to recode or aggregate variables that could allow re-identification in combination with other variables.
- Indirect or quasi identifiers include - sex, gender, age, region, occupation, work place, status in employment, economic activity, occupation status, income, ethnicity, religious affiliation, socio-economic status, marital status, household composition, education level, nationality, mother tongue, rare diseases, etc.
- Remove or check answers to all open-ended questions for direct and indirect identifiers.
Techniques for quantitative data
- Banding or aggregating - for continuous variables like age or income to create broader categories
- Top or bottom coding - for extremes at the top or bottom of scale for age, household composition, income or financial variables
- Re-coding or generalisation - for ethnicity, educational attainment, employment, nationality, religion, geographic location, etc., merge detailed subcategories into broader groups.
- Using standard coding frames is recommended where possible to increase interoperability, e.g.
- NUTS2 (Nomenclature of Territorial Units for Statistics) for geographic variables
- ISCED 2011 ( International Standard Classification of Education) for levels of education
- ISCO (International Standard Classification of Occupations) for coding occupations.
- Keep a record of all actions taken
Information and guidance on anonymising data is available from:
(opens in a new window)CESSDA DMEG chapter on anonymisation
(opens in a new window)UK Data Service anonymising data pages
UK Anonymisation Network (UKAN) provides an (opens in a new window)Anonymisation Decision-making Framework available on their website.
(opens in a new window)Anonymisation and Personal Data Guidance from Finish Social Science Data Archive (FSD)
(opens in a new window)Data Privacy Handbook from the University of Utrecht
(opens in a new window)Handbook for data containing personal information from Swedish National Data Service (SND) which offers guidance on managing personal data in research.
Tools for anonymisation
(opens in a new window)Amnesia - is a tool from OpenAIRE for anonymising data which allows you to aggregate variables and evaluate the re-identification risk. Amnesia is Java-based and can be downloaded and run locally on your computer.
(opens in a new window)ARX(opens in a new window) - is an open source software for anonymising sensitive personal data. ARX is Java-based and can be run locally on your computer via a compatible Java environment.
(opens in a new window)sdcMicro - is an R-package to anonymise data which allows you to check disclosure risk by examining combinations of key variables. sdcMicro can be downloaded and run locally on your computer.