Separating identifying variables from your data
Overview
Teaching: 0 min
Exercises: 0 minQuestions
What is sensitive data?
How can we make data non-sensitive and still useful?
Objectives
First learning objective. (FIXME)
Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention. Major, familiar categories of sensitive data are: personal data - health and medical data - ecological data that may place vulnerable species at risk.
Separating or de-identifying your data generally occurs to protect an individuals privacy. According to the Australian Privacy Act 1988, “personal information is de-identified if the information is no longer about an identifiable individual or an individual who is reasonably identifiable”. De-identified information is no longer considered personal information and can be shared. More information on the Commonwealth Privacy Act can be located at https://www.legislation.gov.au/Details/C2016C00979
De-identifiying aims to allow data to be used by others for publishing, sharing and reuse without the possibility of individuals/location being re-identified. It may also be used to protect the location of archaeological findings, cultural data of location of endangered species.
Any identifiers (name, date of birth, address or geospatial locations etc) should be removed from main data set and replaced with a code/key. The code/key is then preferably encrypted and stored separately. By storing de-identified data in a secure solution, you are meeting safety, controlled, ethical, privacy and funding agency requirements.
Re-identifing an individual is possible by recombining the de-identifiable data set and the identifiers.
Australian practical guidance for De-identification (ARDC)
Australian Research Data Commons (ARDC) formerly known as Australian National Data Service (ANDS) released a fabulous guide on De-identification. The De-identification guide is intended for researchers who own a data set and wish to share safely with fellow researchers or for publishing of data. The guide can be located here https://www.ands.org.au/working-with-data/sensitive-data/de-identifying-data
Here are examples of practical guidelines available nationally
- The Australian Government’s Office of the Australian Information Commissioner (OAIC) and CSIRO Data61 have released a ‘De-identification Decision Making Framework’, which is a “practical guide to de-identification, focussing on operational advice”. The guide will assist organisations that handle personal information to de-identify their data effectively.
- The OAIC also provides high-level guidance on de-identification of data and information, outlining what de-identification is, and how it can be achieved. https://www.oaic.gov.au/agencies-and-organisations/guides/de-identification-and-the-privacy-act
- The Australian Government’s guidelines for the disclosure of health information, includes techniques for making a data set non-identifiable and example case studies. https://www.aihw.gov.au/reports-data
- Australian Bureau of Statistics’ National Statistical Service Handbook. Chapter 11 contains a summary of methods to maintain privacy.
- med.data.edu.au gives information about anonymisation https://www.aihw.gov.au/reports-data
- Office of the Information Commissioner Queensland’s guidance on de-identification techniques https://www.oic.qld.gov.au/guidelines/for-government/guidelines-privacy-principles/applying-the-privacy-principles/privacy-and-de-identification
Tips for managing de-identificatioin (ARDC)
- Plan de-identification early in the research as part of your data management planning
- Retain original unedited versions of data for use within the research team and for preservation
- Create a de-identification log of all replacements, aggregations or removals made
- Store the log separately from the de-identified data files
- Identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML markup tags e.g.
Management of identifiable data (ARDC)
Data may often need to be identifiable (i.e. contains personal information) during the process of research, e.g. for analysis. If data is identifiable then ethical and privacy requirements can be met through access control and data security. This may take the form of:
- Control of access through physical or digital means (e.g. passwords)
- Encryption of data, particularly if it is being moved between locations
- Ensuring data is not stored in an identifiable and unencrypted format when on easily lost items such as USB keys, laptops and external hard drives.
- Taking reasonable actions to prevent the inadvertent disclosure, release or loss of sensitive personal information.
Safely sharing sensitive data guide (ARDC)
- ANDS’ De-identification Guide collates a selection of Australian and international practical guidelines and resources on how to de-identify datasets.
Attribution:
- Australian National Data Service. (2018). ANDS guide: De-identification. Retrieved from https://www.ands.org.au/__data/assets/pdf_file/0003/737211/De-identification.pdf
- Australian National Data Service. (2018). Safely sharing sensitive data. (2018). Retrived from https://www.ands.org.au/working-with-data/sensitive-data/sharing-sensitive-data
Key Points
First key point. Brief Answer to questions. (FIXME)