How we easily keep track with our research data

Trulli
(c) http://phdcomics.com/comics/archive/phd101212s.gif

Let me start with an illustration about today’s topic: Version control. In the comic you see what happens when your files are not version controlled.

We may keep adding comments and numbers to the file trying to version it that way. But what happens is that we have several files on the computer with different versions. How do we know which file is the current to work on, which file has the comments from the colleague?

It is time to think about using a thing called version control. You may know this from your Word document when you can track changes made within the document. But what about your data itself, your images etc. You may alter them, too and want to keep their different versions.

git – (not only) for geeks

There is a software which can take care of your data and versions in an incredibly efficient way. It is called simply  “git” (https://git-scm.com).

However it was developed from the guy who created Linux to maintain its code, it is able to handle more than only code-files. It is a tool that can be used for collaborations since it facilitates collaborative changes to files. Take a look at the videos to learn quickly about version control and git (https://git-scm.com/videos) or consider reading its manual (https://git-scm.com/book/en/v2).

Git is mainly used via command line but several clients are available. Browse through the list with clients.

RWTH service: GitLab

RWTH Aachen University offers GitLab as an online platform for versioning research data and organizing collaborations: https://git.rwth-aachen.de/.

Students or employees can login via Shibboleth, external partners can use a GitHub account for authentication.

If you are not familiar with the online platform GitLab there are plenty of helpful introduction videos available, e.g. *GitLab Beginner Tutorial 1 | Introduction and Getting Started*, or you can take a look into the documentation (in the RWTH-Wiki).

Workshops available

If you would like to know more about git and GitLab there are workshops offered by the RWTH:

“Management wissenschaftlicher Quellcodes und anderer Forschungsdaten – Grundlagen der Versionsverwaltung mit git und GitLab”

moodle-course (also for do-it-yourself-people).

If there are about ten people interested in such a workshop we can get it exclusively for the CRC. Let me know if you would like doing that.

Further Reading

If you think that version control is not suitable for your field of research  have a look at this article:

Blischak JD, Davenport ER, Wilson G (2016) A Quick Introduction to Version Control with Git and GitHub. PLoS Comput Biol 12(1): e1004668. https://doi.org/10.1371/journal.pcbi.1004668

Note, that the authors are no IT geeks at all – they are affiliated with the Committee on Genetics, Genomics, and Systems Biology, University of Chicago and the Department of Molecular Biology and Genetics, Cornell University and give an field related introduction to the topic.

This is the newsletter of the CRC 1382 in which regularly topics regarding (good) research data managements are discussed.

The information provided are selected by the data steward Dr. Lukas C. Bossert.
They are tailored to meet the standards and requirements of the UKA and RWTH.

If you think that the tips and tricks provided do not fit to your data I would be happy to discuss it and take a look at your data and its organization.