Data organization, documentation and metadata

When large amounts of data files are collected, it is very common to lose track. A consistent and logical organization of your files and filenames helps prevent that. In addition, during research it is advisable to keep track of the decisions you make regarding your data.

In order to keep your data usable for yourself and others, is it advisable to describe your data. Good documentation during research ensures your data can be understood now and in the future and the data is properly interpreted, in the relevant context. Documentation files are files that describe the content of your dataset (for example codebooks, to explain the variables and their meanings), files that explain the context of the data (version logs, methodology, etc. to answer the who, what, why, where and when of the data) and files that describe the structure of the dataset (for example a readme file that contains an overview of the various folders and files of the dataset).

When data is stored for the long term, metadata is desired. Metadata are data about data: they are the characteristics or properties of a file. As soon as you start publishing or archiving your data, you will be asked to provide this data. These metadata are useful for others to discover your data and typically list information such as involved researchers, description, keywords and language of the data.