Separating identifying variables from your data

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What is sensitive data?

How can we make data non-sensitive and still useful?

Objectives

First learning objective. (FIXME)

Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention. Major, familiar categories of sensitive data are: personal data - health and medical data - ecological data that may place vulnerable species at risk.

Separating or de-identifying your data generally occurs to protect an individuals privacy. According to the Australian Privacy Act 1988, “personal information is de-identified if the information is no longer about an identifiable individual or an individual who is reasonably identifiable”. De-identified information is no longer considered personal information and can be shared. More information on the Commonwealth Privacy Act can be located at https://www.legislation.gov.au/Details/C2016C00979

De-identifiying aims to allow data to be used by others for publishing, sharing and reuse without the possibility of individuals/location being re-identified. It may also be used to protect the location of archaeological findings, cultural data of location of endangered species.

Any identifiers (name, date of birth, address or geospatial locations etc) should be removed from main data set and replaced with a code/key. The code/key is then preferably encrypted and stored separately. By storing de-identified data in a secure solution, you are meeting safety, controlled, ethical, privacy and funding agency requirements.

Re-identifing an individual is possible by recombining the de-identifiable data set and the identifiers.

Australian practical guidance for De-identification (ARDC)

Australian Research Data Commons (ARDC) formerly known as Australian National Data Service (ANDS) released a fabulous guide on De-identification. The De-identification guide is intended for researchers who own a data set and wish to share safely with fellow researchers or for publishing of data. The guide can be located here https://www.ands.org.au/working-with-data/sensitive-data/de-identifying-data

Here are examples of practical guidelines available nationally

The Australian Government’s Office of the Australian Information Commissioner (OAIC) and CSIRO Data61 have released a ‘De-identification Decision Making Framework’, which is a “practical guide to de-identification, focussing on operational advice”. The guide will assist organisations that handle personal information to de-identify their data effectively.
The OAIC also provides high-level guidance on de-identification of data and information, outlining what de-identification is, and how it can be achieved. https://www.oaic.gov.au/agencies-and-organisations/guides/de-identification-and-the-privacy-act
The Australian Government’s guidelines for the disclosure of health information, includes techniques for making a data set non-identifiable and example case studies. https://www.aihw.gov.au/reports-data
Australian Bureau of Statistics’ National Statistical Service Handbook. Chapter 11 contains a summary of methods to maintain privacy.
med.data.edu.au gives information about anonymisation https://www.aihw.gov.au/reports-data
Office of the Information Commissioner Queensland’s guidance on de-identification techniques https://www.oic.qld.gov.au/guidelines/for-government/guidelines-privacy-principles/applying-the-privacy-principles/privacy-and-de-identification

Tips for managing de-identificatioin (ARDC)

Plan de-identification early in the research as part of your data management planning
Retain original unedited versions of data for use within the research team and for preservation
Create a de-identification log of all replacements, aggregations or removals made
Store the log separately from the de-identified data files
Identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML markup tags e.g.

Management of identifiable data (ARDC)

Data may often need to be identifiable (i.e. contains personal information) during the process of research, e.g. for analysis. If data is identifiable then ethical and privacy requirements can be met through access control and data security. This may take the form of:

Control of access through physical or digital means (e.g. passwords)
Encryption of data, particularly if it is being moved between locations
Ensuring data is not stored in an identifiable and unencrypted format when on easily lost items such as USB keys, laptops and external hard drives.
Taking reasonable actions to prevent the inadvertent disclosure, release or loss of sensitive personal information.

ANDS’ De-identification Guide collates a selection of Australian and international practical guidelines and resources on how to de-identify datasets.

Attribution:

Australian National Data Service. (2018). ANDS guide: De-identification. Retrieved from https://www.ands.org.au/__data/assets/pdf_file/0003/737211/De-identification.pdf
Australian National Data Service. (2018). Safely sharing sensitive data. (2018). Retrived from https://www.ands.org.au/working-with-data/sensitive-data/sharing-sensitive-data

Key Points

First key point. Brief Answer to questions. (FIXME)

previous episode

Reproducible Research Things

next episode

Separating identifying variables from your data

Overview

Australian practical guidance for De-identification (ARDC)

Tips for managing de-identificatioin (ARDC)

Management of identifiable data (ARDC)

Attribution:

Key Points

previous episode

next episode

previous episode

Reproducible Research Things

next episode

Separating identifying variables from your data

Overview

Australian practical guidance for De-identification (ARDC)

Tips for managing de-identificatioin (ARDC)

Management of identifiable data (ARDC)

Safely sharing sensitive data guide (ARDC)

Attribution:

Key Points

previous episode

next episode