Documentation

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What is documentation?

Objectives

Describe what needs to be documented.

Documentation

Documentation is the idea of documenting your procedures for your experiment so that an outsider could understand the workings of your lab. This can include where your results and working data are saved.

Copy your lab notebook if you have one onto a digital format and save it to a safe place (such as research storage).
Make sure these are saved somewhere that’s accessible to your supervisor/team.

Bus Factor

Have you got a new staff member coming onboard to your team? They are a prime candidate to collate information and document as it will help them become familiar to the team and learn how the lab/team works.

Note: Ideally you want to document anything that a lab member coming on board would need to know. Documentation is all about changing your Bus Factor - how many people on a project would need to be hit by a bus to make a project fail. Many times, projects can have a bus factor of one. Adding documentation means when someone goes on leave, needs to take leave suddenly or finishes their study, their work is preserved for your lab.

Documentation helps with reproducible science

Documentation will also be important for any audits in your lab or if someone would like to reproduce your research.

Documentation is a love letter to your future self -by Damian Conway

How do we start? - Beginners

Read this first: How to start Documenting and more by CESSDA ERIC. Start with documenting in a text file or document - any start is a good start. Have this document automatically synced to the cloud with your data or keep this in a shared place that your organisation supports and recommends.

How do we start? - Intermediate

Once you have the basics in place, go into detail on how your workflow goes from your raw data to the finished results. This can be anything from a downloaded function list from SPSS/Virtual Lab to the code used to create it.

How do we start? - Advanced

Now that you’ve got a good head start, time to learn about Git Repositories and wikis.

External Resources

Key Points

Documentation is the idea of documenting your procedures for your experiment so that an outsider could understand how to reproduce it. This can include where your results and working data are saved.

Naming conventions

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What is a File Naming Convention?

What is a File Name?

What are the benefits of using a file naming convention?

Objectives

First learning objective. (FIXME)

What is a File Naming Convention?

A File Naming Convention (FNC) is a framework or protocol if you like for naming your files in a way that describes what files contain and importantly, how they relate to other files. It is essential prior to collecting data to establish an agreed FNC.

What is a File Name?

File names are the names that are listed in the file directory and that team members give to new files when they are saved for the first time.

What are the benefits of using a file naming convention?

Naming files consistently, logically and in a predictable manner will prevent against unorganised files, misplaced or lost data. It could also prevent possible backlogs or project delays. A file naming convention will ensure files are:

Easier to process - All team members won’t have to over think the file naming process
Easier to facilitate access, retrieval and storage of files
Easier to browse through files saving time and effort
Harder to lose!
Having logical and known naming conventions in place can also help you with version control (See Version Control for more information).
Check for obsolete or duplicate records

Checklist

The University of Edinburgh has a comprehensive and easy to follow list (with examples and explanations) of 13 Rules for file naming conventions

Coming up with a plan for your team on how to name files.

Former PhD student and subsequent founder of the Figshare platform, Mark Hahnel, typified a common challenge: ‘During my PhD I was never good at managing my research data. I had so many different file names for my data that I always struggled to find the correct file quickly and easily when it was requested. My former PI was so horrified upon seeing the state of my data organisation that she held an emergency lab book meeting with the rest of my group when l was leaving’. - Research Information, April/May 2014

Your research team should agree on the following elements of a file name prior to data collection:

Vocabulary - choose a standard vocabulary for file names, so that everyone uses a common language
Punctuation - decide when to use punctuation symbols, capitals and hyphens
Dates - agree on a logical use of dates so that they display chronologically i.e. YYYY-MM-DD
Order - confirm which element should go first, so that files on the same theme are listed together and can therefore be found easily
Numbers - specify the amount of digits that will be used in numbering so that files are listed numerically e.g. 01, 002, etc.

As previously suggested, consistent and meaningful naming of files and folders can make everyone’s life easier. See this example below:

.language-python: YYYYMMDD_SiteA_SensorB.CSV Date Location Sensor

Which when applied, would look like this below

20150621_Yaouk_Humidity.CSV

Some characters may have special meaning to the operating system so avoid using these characters when you are naming files. These characters include the following: / \ “ ‘ * ; - ? [ ] ( ) ~ ! $ { } &lt > # @ &

space tab newline https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.osdevice/filename_conv.htm

Naming conventions - Beginner

Let’s look at some naming convention for your data files and documents. Any dates are best stored with YYYY-MM-DD. Try to avoid spaces in your file names

Intermediate

Make sure you follow the 13 Rules for file naming conventions

Naming conventions - Advanced

Do you have a policy in your team around naming conventions? If not, this is a great way of getting everyone on the same page.

Internal Resources

Talk to your Research Support Services librarian.

External Resources

Naming things by Jenny Bryan
File naming and folder conventions by CESSDA ERIC
The University of Edinburgh has a comprehensive yet easy to follow list (with examples and explanations) of 13 Rules for file naming conventions https://www.ed.ac.uk/records-management/guidance/records/practical-guidance/naming-conventions
Australian National Data Services (ANDS). (2018). ANDS Guide: File wrangling

Key Points

A File Naming Convention is a framework for naming your files in a way that describes what files contain and how they relate to other files.

Folder structure

Overview

Teaching: 0 min
Exercises: 0 min

Questions

Why is a folder structure helpful?

Objectives

Describe what needs to be documented.

Folder structure

Having a standard folder structure can keep your files neat and tidy and save you time looking for data. It can also help if you are sharing files with colleagues and having a standard place to put working data and documentation.

Like files, folders can also follow a naming convention. By prefixing with numbers, you can force your files to be ordered by the steps in your workflow. Probably the simplest way to document your structure - for your future reference - is to add a “README” file - a text file outlining the contents of the folder.

A folder structure might look like this image folder structure

How to develop a folder structure

To develop a logical structure for your team, you need to consider the following points:

Check to make sure there are no pre-existing folder structure agreements
Name folders appropriately and in a meaningful manner. Don’t use staff names and consider using the type of work
Consistency - make sure you use the agreed structure/hierarchy
Structure folders hierarchically - start with a limited number of folders for the broader topics, and then create more specific folders within these
Separate ongoing and completed work - as you start to create lots of folders and files, it is a good idea to start thinking about separating your older documents from those you are currently working on
Backup – ensure folders and files are backed up and retrievable in the event of a disaster. Griffith like most universities, have safe storage solutions.
Clean up folders and files post project.

Beginner

Pick a dataset and illustrate how you currently organise your files. (For the artists: Draw a picture that describes your current approach to file organisation)
See if you can devise a better naming convention or note one or two improvements you could make to how you name your files

There’s some really good folder template shapes around. Here’s one you are welcome to download and use URL Or another you could try out if you preferfrom http://nikola.me/

Advanced

Come up with a policy for your group for folder structures. You could create a template and put it in a downloadable location for them to get them started.

External Resources

Key Points

Having a standard folder structure can keep your files neat and tidy and save you time looking for data. It can also help if you are sharing files with colleagues and having a standard place to put working data and documentation.

Automation

Overview

Teaching: 0 min
Exercises: 0 min

Questions

How can you automate any repetitive tasks?

Objectives

First learning objective. (FIXME)

Often, tasks that need to be done over and over again by a human can be opportunities for human error to sneak in. Setting up an automated way of doing this can eliminate this issue. Anything from an excel formula or macro to coding in a data science frameword can help.

Ways you can automate things:

Spreadsheet Macros and formulas
MacOS- Automator
Win 10- Task scheduler
Microsoft flow or Google script
Learning to code in Python or R - Talk to your local hacky hour or Software Carpentry people

Beginner

Let’s thing about the repetitive tasks that you could automate- do you always rename files the same way? Do you manually copy files across?

Advanced

Could you code up your work so its completely automated?

Key Points

First key point. Brief Answer to questions. (FIXME)

Versioning

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What is a version control system?

Are you keeping track and logs of your analysis?

Objectives

First learning objective. (FIXME)

Version control system

A version control system allows users to keep track of changes in your Data or Process

Are you keeping track of any versions or logs made by the software in use?

Make sure you have a copy of every step you have completed and if possible, version numbers for the program you are using and any libraries. Programs change over time and this can alter your results if someone asks to replicate your work post publication.

Never make alterations to your raw data files

Instead, make a copy of the raw data files and keep them somewhere safe (like Research Vault). That way, if you need to redo your work or you find an error earlier in your workflow, you have an original baseline to start from.

Write down versions of analysis software

Write down the versions of analysis software (like SPSS or NVIVO etc) AND hardware (MRI machines etc). Your documentation is a great place for this, but even just in your lab notebook will work.

Random Number Generator

If you are using random numbers in your research, save your random seed generator number as part of your working data. This way, you can later reproduce your results.

Beginner

Copy your raw data to a cloud storage solution such as Research Vault for safe keeping.

Intermediate

If you are using a workflow program (Galaxy, KNIME, a virtual lab like EcoCloud or TINKER Humanities,Arts and Social Science Virtual Lab, you can copy your workflow and save it as part of your documentation. Write the date that you ran the workflow if versions of the software are not available.

Advanced 1

If you are writing scripts (R/Python/Matlab etc), use Git.

Note: Griffith has a gitlab version you can use for private repositories. Also record the version of R/Python/Matlab, the operating system you are using and the version numbers of any library you are using.
If you are using the HPC, also record the version of any modules you used there.

Advanced 2

If you’ve heard of Docker or Singularity and you are interested, come talk to hacky hour/eResearch Services

External Resources

Key Points

A version control system allows users to keep track of changes in your Data or Process

Cloud Storage of your Data

Overview

Teaching: 0 min
Exercises: 0 min

Questions

Key question (FIXME)

Objectives

First learning objective. (FIXME)

Keep a copy of your data on the cloud

Keeping a copy of all your data (working, raw and completed) in the cloud is incredbilty important. This ensures that if you have a computer failure, accidently delete your data or your data is corrupted, your research is restorable.

Griffith has three different types of cloud storage made especially for research

Research Drive

This would be a good place for your day-to-day working files. It is unlimited and you can share it with people at Griffith (but not externally). This works the same as G drive.

Research Space

This has a ‘sync’ client that automatically copies your files from your computer to the cloud- just like dropbox or google drive. You can use this to share with people external to the university. You can add them with a Linkedin profile, Griffith, other university or Gmail account, or you can share with a URL, password and expiry date. This is also unlimited storage- you are given 5GB initially, and to add an unlimited folder, just click ‘Add more storage’.

Research Vault

For your long term backups. Perfect place to store a safe copy of your raw data or the research of your PhD student who has completed and is leaving the institute.

Not sure which one is best? Click here

Beginner

Get your data into Research Storage - If you need help picking one, talk to the library or eResearch Support

Advanced

Build a policy for your team or group on where things are stored. Make sure the location of your data is saved in your documentation

Key Points

First key point. Brief Answer to questions. (FIXME)

Computer Security

Overview

Teaching: 0 min
Exercises: 0 min

Questions

Key question (FIXME)

Objectives

First learning objective. (FIXME)

Security

Ensuring that your computer and network are secured means that you have far less a chance of a data breach or hack.

Beginner

Have good strong passwords and encrypt your computer’s hard drive

Intermediate

Get set up on a password manager

Advanced

Let’s ensure your lab/office is encrypted and practicing safe habits Note: The boss’s computer is usually the most insecure

Encrypt your computer

Encryption- https://www.griffith.edu.au/about-griffith/cybersecurity/data-protection
Win 10 Encryption: https://www.windowscentral.com/how-use-bitlocker-encryption-windows-10
Win 7 Encryption: https://www.microsoft.com/en-au/download/details.aspx?id=4794 (Call 55555 first and ask their advice as they can help you install this- it doesn’t look as simple as Win 10)
Mac OS https://support.apple.com/en-au/HT204837

Strong passwords

https://www.griffith.edu.au/passwords/password-management
Video: https://youtu.be/PjHc8g8G9MU
Find out if your email has been compromised https://haveibeenpwned.com/
Use a password manager such as https://www.lastpass.com/business-password-manager

Using Multi-Factor Authentication when the option is available (Signing in with a password and an email to your account with a pin)

Avoid unsecure wifi - If its available, Eduroam is usually a better option than free wifi/cafe wifi

Use a VPN whenever you’re not at work

https://intranet.secure.griffith.edu.au/computing/remote-access/accessing-resources/virtual-private-network (55 555 can help you out too)

Keeping your OS and products up to date (esp web browser)

You can use Qualsys Browser Check to confirm your browser is set securely
https://www.griffith.edu.au/about-griffith/cybersecurity/cybersecurity-at-home

Griffith provides Symantec anti-virus FREE for Griffith staff and students https://intranet.secure.griffith.edu.au/computing/software/self-help-and-support/software-download-service4

Key Points

First key point. Brief Answer to questions. (FIXME)

Separating identifying variables from your data

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What is sensitive data?

How can we make data non-sensitive and still useful?

Objectives

First learning objective. (FIXME)

Sensitive data are data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention. Major, familiar categories of sensitive data are: personal data - health and medical data - ecological data that may place vulnerable species at risk.

Separating or de-identifying your data generally occurs to protect an individuals privacy. According to the Australian Privacy Act 1988, “personal information is de-identified if the information is no longer about an identifiable individual or an individual who is reasonably identifiable”. De-identified information is no longer considered personal information and can be shared. More information on the Commonwealth Privacy Act can be located at https://www.legislation.gov.au/Details/C2016C00979

De-identifiying aims to allow data to be used by others for publishing, sharing and reuse without the possibility of individuals/location being re-identified. It may also be used to protect the location of archaeological findings, cultural data of location of endangered species.

Any identifiers (name, date of birth, address or geospatial locations etc) should be removed from main data set and replaced with a code/key. The code/key is then preferably encrypted and stored separately. By storing de-identified data in a secure solution, you are meeting safety, controlled, ethical, privacy and funding agency requirements.

Re-identifing an individual is possible by recombining the de-identifiable data set and the identifiers.

Australian practical guidance for De-identification (ARDC)

Australian Research Data Commons (ARDC) formerly known as Australian National Data Service (ANDS) released a fabulous guide on De-identification. The De-identification guide is intended for researchers who own a data set and wish to share safely with fellow researchers or for publishing of data. The guide can be located here https://www.ands.org.au/working-with-data/sensitive-data/de-identifying-data

Here are examples of practical guidelines available nationally

The Australian Government’s Office of the Australian Information Commissioner (OAIC) and CSIRO Data61 have released a ‘De-identification Decision Making Framework’, which is a “practical guide to de-identification, focussing on operational advice”. The guide will assist organisations that handle personal information to de-identify their data effectively.
The OAIC also provides high-level guidance on de-identification of data and information, outlining what de-identification is, and how it can be achieved. https://www.oaic.gov.au/agencies-and-organisations/guides/de-identification-and-the-privacy-act
The Australian Government’s guidelines for the disclosure of health information, includes techniques for making a data set non-identifiable and example case studies. https://www.aihw.gov.au/reports-data
Australian Bureau of Statistics’ National Statistical Service Handbook. Chapter 11 contains a summary of methods to maintain privacy.
med.data.edu.au gives information about anonymisation https://www.aihw.gov.au/reports-data
Office of the Information Commissioner Queensland’s guidance on de-identification techniques https://www.oic.qld.gov.au/guidelines/for-government/guidelines-privacy-principles/applying-the-privacy-principles/privacy-and-de-identification

Tips for managing de-identificatioin (ARDC)

Plan de-identification early in the research as part of your data management planning
Retain original unedited versions of data for use within the research team and for preservation
Create a de-identification log of all replacements, aggregations or removals made
Store the log separately from the de-identified data files
Identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML markup tags e.g.

Management of identifiable data (ARDC)

Data may often need to be identifiable (i.e. contains personal information) during the process of research, e.g. for analysis. If data is identifiable then ethical and privacy requirements can be met through access control and data security. This may take the form of:

Control of access through physical or digital means (e.g. passwords)
Encryption of data, particularly if it is being moved between locations
Ensuring data is not stored in an identifiable and unencrypted format when on easily lost items such as USB keys, laptops and external hard drives.
Taking reasonable actions to prevent the inadvertent disclosure, release or loss of sensitive personal information.

ANDS’ De-identification Guide collates a selection of Australian and international practical guidelines and resources on how to de-identify datasets.

Attribution:

Australian National Data Service. (2018). ANDS guide: De-identification. Retrieved from https://www.ands.org.au/__data/assets/pdf_file/0003/737211/De-identification.pdf
Australian National Data Service. (2018). Safely sharing sensitive data. (2018). Retrived from https://www.ands.org.au/working-with-data/sensitive-data/sharing-sensitive-data

Key Points

First key point. Brief Answer to questions. (FIXME)

Identifiers

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What is a DOI?

What is a PID?

Objectives

First learning objective. (FIXME)

Digital Object Identifier (DOI) and Persistent identifier (PiD)

Once you’ve completed your project, help make your research data discoverable, accessible and possibly re-useable using a PiD such as a DOI!

A Digital Object Identifier (DOI) is a unique alphanumeric string assigned by either a publisher, organisation or agency that identifies content and provides a PERSISTENT link to its location on the internet, whether the object is digital or physical. It might look something like this http://dx.doi.org/10.4225/01/4F8E15A1B4D89. The DOI or the Identifier is listed at the bottom of this record from Griffiths’ Research Data Repository.

DOIs are also considered a type of persistent identifiers (PiDs). An identifier is any label used to name some thing uniquely (whether digital or physical). URLs are an example of an identifier. So are serial numbers, and personal names. A persistent identifier is guaranteed to be managed and kept up to date over a defined time period.

Journal publishers assign DOIs to electronic copies of individual articles. DOIs can also be assigned by an organisation, research institutes or agencies and are generally managed by the relevant organisation and relevant policies. DOIs not only uniquely identify research data collections, it also supports citation and citation metrics.

Key messages:

DOIs are a persistent identifier and as such carry expectations of curation, persistent access and rich metadata
DOIs can be created for DATA SETS and associated outputs (eg grey literature, workflows, algorithms, software etc) - DOIs for data are equivalent with DOIs for other scholarly publications
DOIs enable accurate data citation and bibliometrics (both metrics and altmetrics)
Resolvable DOIs provide easy online access to research data for discovery, attribution and reuse

Beginner

Ensure data you associate with a publication has a DOI- your library is the best group to talk to for this.

Intermediate

Learn more about how your DOI can potentially increase your citation rates by watching this 4m:51s video

Learn more about how your DOI can potentially increase your citation rate by reading the ANDS Data Citation Guide

Advanced

Learn more about PiDs and DOIs https://www.ands.org.au/guides/persistent-identifiers-awareness|

Internal Resources

Contact the Library team for advice on how to obtain a DOI upon project completion.

External Resources

Key Points

A DOI is a Digital Object Identifier

A PiD is a Persistent identifier

Reproducible Research Things

Documentation

Overview

Documentation

Bus Factor

Documentation helps with reproducible science

How do we start? - Beginners

How do we start? - Intermediate

How do we start? - Advanced

External Resources

Key Points

Naming conventions

Overview

What is a File Naming Convention?

What is a File Name?

What are the benefits of using a file naming convention?

Checklist

Coming up with a plan for your team on how to name files.

Which when applied, would look like this below

Naming conventions - Beginner

Intermediate

Naming conventions - Advanced

Internal Resources

External Resources

Key Points

Folder structure

Overview

Folder structure

How to develop a folder structure

Beginner

Advanced

External Resources

Key Points

Automation

Overview

Beginner

Advanced

Key Points

Versioning

Overview

Version control system

Never make alterations to your raw data files

Write down versions of analysis software

Random Number Generator

Beginner

Intermediate

Advanced 1

Advanced 2

External Resources

Key Points

Cloud Storage of your Data

Overview

Keep a copy of your data on the cloud

Research Drive

Research Space

Research Vault

Beginner

Advanced

Key Points

Computer Security

Overview

Security

Beginner

Intermediate

Advanced

Encrypt your computer

Strong passwords

Key Points

Separating identifying variables from your data

Overview

Australian practical guidance for De-identification (ARDC)

Tips for managing de-identificatioin (ARDC)

Management of identifiable data (ARDC)

Safely sharing sensitive data guide (ARDC)

Attribution:

Key Points

Identifiers

Overview

Digital Object Identifier (DOI) and Persistent identifier (PiD)

Key messages: