Organising and documenting your data will also help you:
- communicate with colleagues about your research
- find data you have created earlier
- prepare them for archiving and sharing with others at the end of the project
- make your data FAIR (findable, accessible, interoperable, reusable).
File structure
Think about the best hierarchy for files. Should they be organised by:
- type of data: text, dataset, images?
- research activities: interviews, surveys?
- type of material: documentation, publications, data?
Make an effective hierarchy and stick to it. It will help you reliably find files in the future and know exactly where files should go as your project progresses.
File names
Meaningful file names help you know the content and status of a file:
- use terms like project acronyms, researcher’s initials, or information that describes the type of file
- add version numbers, file status, or a date
- keep file names short
- don't use spaces or special characters
Versioning
Version control helps you distinguish between different iterations of your work, so that you can find correct versions as needed. You should decide:
- how many copies of a file you need to keep
- how long you need to keep them
- how you will tell each version apart, for example by using a consistent naming convention (see above)
- If you store files in various places, you'll need to remember to synchronise the copies regularly.
If your data isn't digital
If notebooks, physical documents, models, artefacts or live experiences form part of your data, you need to manage and care for them as well as your digital data.
Make digital copies to preserve the data against physical damage and prepare it for long-term storage and sharing. You'll need to identify appropriate formats that represent the data and are suitable for preservation and reuse - plan ahead for this.
Choosing file formats for archiving
Data created in digital formats may also need converting to formats for storage and sharing. Make sure you build time and resources into your project to allow for this. Use open or standard formats from the start and make sure your data are backed up in open formats as you go along, can save a lot of time at the end of the project.
Choose the formats of your file so that the minimum amount of work is required to enable others to reuse your work and for the best preservation of your data.
See advice on file formats and recommended formats for various types of data.
Interviews and other audio data will need transcribing so make sure you have planned for this specialist task.
More about creating digital surrogates of your work and preparing your data for archiving.
Keep your data safe
Data can be lost in many different ways: through human error, hardware failure, software or media faults, or malicious hacking and virus infection. Digital data files can also be corrupted in storage or through file transfer.
How you can protect your data:
- Use the University of Kent network and Data Centre to process, store and back-up your data. The University of Kent Data Centre is guaranteed by Cyber Essentials certification.
- Back up your work regularly. If files contain sensitive data or personal information, create only a minimum number of copies (for example one master file and a backup copy)
- Encrypt your files
- Make sure your laptop is encrypted
- Follow our IT safety and security advice
- Make sure you choose the optimum file format for your data in the long term.
- Deposit it in a trustworthy repository at the end of your project.
Document your data
Keep a record of decisions you make about how you organise your data.
During your project this will help you remember your decisions and ensure that all the members of the team are doing the same thing. Documentation will keep your data consistent and help you interpret your data and give it context in the short and long term.
Once your project is completed, you will need to archive this information with your data to help future users understand your data.
The more detail you record, the more useful the documentation will be. Include:
- study-level information: details about the design of the research, methodologies, processes
- data level information: which may be embedded in the files
- metadata: according to an established schema that is used by data repositories to describe your data.
You'll need to create a README file to describe everything needed to replicate the data and help others use it and understand it properly.
README files explained
When you archive your data at the end of the project, a README file accompanies the datasets to introduce them and give them context. Its purpose is to describe everything needed to replicate the data, or to use it and understand it properly.
If you have kept notes throughout the project and have an up-to-date data management plan, creating the README file will be straightforward.
The outline below shows one way of approaching a README file, questions you could answer, and information you could include.
Outline
Data and file overview
- For each file, a short description of what it contains, and who created it
- format of the file if not obvious from the file name
- if the data set includes multiple files that relate to one another, the relationship between the files or a description of the file structure that holds them - you could use terminology like 'dataset' or 'study' or 'data package'
- Date the file was created, dates of updates (versions) and the nature of the updates
- Information about any related data that was collected but isn't in the described dataset.
Methodological information
- Description of methods for data collection or generation - include links or references to publications or other documentation containing experimental design or protocols used
- Description of methods used for data processing - describe how the data was generated from the raw or collected data
- any instrument-specific information needed to understand or interpret the data
- standards and calibration information, if appropriate
- describe any quality-assurance procedures performed on the data
- definitions of codes or symbols used to note or characterize low quality/questionable/outliers that people should be aware of
- People involved with sample collection, processing, analysis and/or submission
- Legal or ethical considerations and agreements.
Data-specific information
- Count of number of variables, and number of cases or rows
- List of variables, including full names and definitions of column headings for tabular data - spell out any abbreviated words
- Units of measurement
- Definitions for codes or symbols used to record missing data
- Specialized formats or other abbreviations used.
Email researchsupport@kent.ac.uk to obtain a blank README file template.
Select your data
To make your data FAIR - ie 'as open as possible, as restricted as necessary' - you need to decide which datasets to archive and share.
The Digital Curation Centre (DCC) outlines five steps to help you decide what to keep. They are:
Get support
Get support with organising and selecting your research data by emailing the Open Research Team.