Organise and select your data

Funders and publishers want to see evidence that you are using good practice in keeping data safe and well managed so that they can be sure of their integrity.

Organising and documenting your data will also help you:

File structure

Think about the best hierarchy for files. Should they be organised by:

  • type of data: text, dataset, images?
  • research activities: interviews, surveys?
  • type of material: documentation, publications, data?

Make an effective hierarchy and stick to it. It will help you reliably find files in the future and know exactly where files should go as your project progresses.

File names

Meaningful file names help you know the content and status of a file:

  • use terms like project acronyms, researcher’s initials, or information that describes the type of file
  • add version numbers, file status, or a date
  • keep file names short
  • don't use spaces or special characters

See examples of file structures and naming conventions.

Versioning

Version control helps you distinguish between different iterations of your work, so that you can find correct versions as needed. You should decide:

  • how many copies of a file you need to keep
  • how long you need to keep them
  • how you will tell each version apart, for example by using a consistent naming convention (see above)
  • If you store files in various places, you'll need to remember to synchronise the copies regularly.

See up-to-date recommendations on versioning.

If your data isn't digital

If notebooks, physical documents, models, artefacts or live experiences form part of your data, you need to manage and care for them as well as your digital data.

Make digital copies to preserve the data against physical damage and prepare it for long-term storage and sharing. You'll need to identify appropriate formats that represent the data and are suitable for preservation and reuse - plan ahead for this.

Choosing file formats for archiving

Data created in digital formats may also need converting to formats for storage and sharing. Make sure you build time and resources into your project to allow for this. Use open or standard formats from the start and make sure your data are backed up in open formats as you go along, can save a lot of time at the end of the project. 

Choose the formats of your file so that the minimum amount of work is required to enable others to reuse your work and for the best preservation of your data. 

See advice on file formats and recommended formats for various types of data.

Interviews and other audio data will need transcribing so make sure you have planned for this specialist task.

More about creating digital surrogates of your work and preparing your data for archiving.

Keep your data safe

Data can be lost in many different ways: through human error, hardware failure, software or media faults, or malicious hacking and virus infection. Digital data files can also be corrupted in storage or through file transfer.

How you can protect your data:

Document your data

Keep a record of decisions you make about how you organise your data.

During your project this will help you remember your decisions and ensure that all the members of the team are doing the same thing. Documentation will keep your data consistent and help you interpret your data and give it context in the short and long term.

Once your project is completed, you will need to archive this information with your data to help future users understand your data.

The more detail you record, the more useful the documentation will be. Include:

  • study-level information: details about the design of the research, methodologies, processes
  • data level information: which may be embedded in the files
  • metadata: according to an established schema that is used by data repositories to describe your data.

You'll need to create a README file to describe everything needed to replicate the data and help others use it and understand it properly.

README files explained

When you archive your data at the end of the project, a README file accompanies the datasets to introduce them and give them context. Its purpose is to describe everything needed to replicate the data, or to use it and understand it properly. 

If you have kept notes throughout the project and have an up-to-date data management plan, creating the README file will be straightforward.

The outline below shows one way of approaching a README file, questions you could answer, and information you could include.

Outline

Data and file overview

  1. For each file, a short description of what it contains, and who created it
    • format of the file if not obvious from the file name
    • if the data set includes multiple files that relate to one another, the relationship between the files or a description of the file structure that holds them - you could use terminology like 'dataset' or 'study' or 'data package'
  2. Date the file was created, dates of updates (versions) and the nature of the updates
  3. Information about any related data that was collected but isn't in the described dataset.

Methodological information

  1. Description of methods for data collection or generation - include links or references to publications or other documentation containing experimental design or protocols used
  2. Description of methods used for data processing - describe how the data was generated from the raw or collected data
    • any instrument-specific information needed to understand or interpret the data
    • standards and calibration information, if appropriate
    • describe any quality-assurance procedures performed on the data
    • definitions of codes or symbols used to note or characterize low quality/questionable/outliers that people should be aware of
  3. People involved with sample collection, processing, analysis and/or submission
  4. Legal or ethical considerations and agreements.

Data-specific information

  1. Count of number of variables, and number of cases or rows
  2. List of variables, including full names and definitions of column headings for tabular data - spell out any abbreviated words
  3. Units of measurement
  4. Definitions for codes or symbols used to record missing data
  5. Specialized formats or other abbreviations used.

  Email researchsupport@kent.ac.uk to obtain a blank README file template.

Select your data

To make your data FAIR - ie 'as open as possible, as restricted as necessary' - you need to decide which datasets to archive and share.

The Digital Curation Centre (DCC) outlines five steps to help you decide what to keep. They are:

  1. Identify what purposes the data could fulfil
  2. Identify what data must be kept
  3. Identify what data should be kept
  4. Weigh up the costs
  5. Complete the data appraisal - using the DCC checklist.

Get support

Get support with organising and selecting your research data by emailing the Open Research Team.


Last updated