How we can organize, name and structure research data

One can spend a lot of time looking for certain files on the computer, especially when they are un-organized, named and structured in an arbitrary way.
Following I present a few hints for avoiding some pitfalls, so things can go easier, workflows will be smoother and in the end the benefit can be noticeable – for oneself and the CRC (think of exchanging your data with colleagues).

In this newsletter you will learn about certain ways of

  1. organizing and structuring your data, and
  2. naming them meaningful and systematically.

The MIT Libraries Data Management Services (similar to the FDM Services of the RWTH) have some very clear and concrete slides, which I recommend going through: You find the slides in the CRC-SharePoint (if you have trouble logging in please send an email to gutliver@ukaachen.de).

I adapted these slides to provide some suggested guidelines.

Organizing and structuring your data

There are two ways of organization structures, one is hierarchical the other is tagged based. But since the hierarchical system is the most common used I will talk about that.

In a hierarchical structure files are organized in different levels which are represented by folders, subfolders, subsubfolders and so on.
This is a familiar way used almost by everyone, it is good at describing the structure of information, and you can store similar items in the same folder.

At the same time this structure is hard to set up and one needs to find the right balance between breadth and depth – between too many subfolders and keeping most files in one (main)folder. Reorganizing your files can make a lot of work and furthermore you only can store a file in one place, unless you create a shortcut/aliases.

To find a systematic file folder structure you should go through the following steps:

  • Define the types of data and file formats; what are your data sets?
  • Collect and include the important contextual information
    • Imagine you (or someone from your team) are looking for a specific file – how do you want to find it?
      • By time period?
      • By creator/project collaborators?
      • By activity or collection method?
      • By its type (e.g. presentation, report)?
    • Organize folders by meaningful categories (e.g. data sets; see as well http://obofoundry.github.io/; for classifications check https://www.dimdi.de/dynamic/de/klassifikationen/)
      • Go from general to specific: “Primary / Secondary / Tertiary”
        • e. g. [project] / [sub-project] / [experiment] / [instrument] / [date]
        • e. g. [research area] / [project] / [data or documentation] / [date]
        • e. g. [project] / [type of file] / [data collector name] / [date]
      • Choose a directory naming convention
        • Determine your unique elements
        • Consider ordering and abbreviations
          • e. g. use “figures”, not “figs” or “figure”
          • Using person’s only last names, if you have to differentiate use the initial of the first name as suffix (e.g. mueller-j, mueller-b)

An exemplary folder structure could be like this:

Naming files

First, check if there is an existing naming convention of your research community, for example “The Open Biological and Biomedical Ontology (OBO)“.

If there is no such ontology/naming convention you can adopt, you need to define a new one.

Naming conventions should be:

  • Descriptive; consider including
    • Unique identifier (e.g. project name)
    • Conditions (e.g. lab instrument, solvent, temperature etc.)
    • Run of experiment (e.g. sequential)
    • Date (e.g. in file properties, too)
    • Version number (e.g. v1, v1a, v2c)
  • Consistent
    • Date: stick to one date-format, preferably YYYY-MM-DD (e.g. 2020-01-31)
    • Numbers: use the same length for number and if necessary fill up with zeros (e.g. 00123, 03948)

For some best practices see the handout of the  MIT Libraries Data Management Services.

Regarding the lab instruments, check if your instrument, software, or other equipment that outputs your data files can be set with a file naming system.

 

The last step is to document your data organizing structure and your naming convention.

This is typically and ideally done with a data management plan.

Name conventions make life easier!

Research Data Management, RWTH Aachen University

Further Reading

Bühler P., Schlaich P., Sinner D. (2019) Dateien. In: Datenmanagement. Bibliothek der Mediengestaltung. Springer Vieweg, Berlin, Heidelberg; https://doi.org/10.1007/978-3-662-55507-1_2

This is the newsletter of the CRC 1382 in which regularly topics regarding (good) research data managements are discussed.

The information provided are selected by the data steward Dr. Lukas C. Bossert.
They are tailored to meet the standards and requirements of the UKA and RWTH.

If you think that the tips and tricks provided do not fit to your data I would be happy to discuss it and take a look at your data and its organization.