File formats for sharing, re-use and long term preservation of research data.

When you are working with your data, you will use formats that are appropriate for the tools you are using. For long term preservation of data we recommend that were possible data is stored using formats based on open standards.

If you have questions regarding data formats and/or data format conversion issues, contact the eResearch Centre for assistance.

The table below is courtesy of the UK Data Archive. The original document can be found at http://www.data-archive.ac.uk/create-manage/format/formats-table.  

Type of data

Acceptable formats for sharing, reuse and preservation

Other acceptable formats for data preservation

Quantitative tabular data with extensive metadata

a dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data

SPSS portable format (.por)

delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information

some structured text or mark-up file containing metadata information, e.g. DDI XML file

proprietary formats of statistical packages e.g. SPSS (.sav), Stata (.dta)
MS Access (.mdb/.accdb)

Quantitative tabular data with minimal metadata

a matrix of data with or without column headings or variable names, but no other metadata or labelling

comma-separated values (CSV) file (.csv)

tab-delimited file (.tab)

including delimited text of given character set with SQL data definition statements where appropriate

delimited text of given character set - only characters not present in the data should be used as delimiters (.txt)

widely-used formats, e.g. MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf) and OpenDocument Spreadsheet (.ods)

Geospatial data

vector and raster data

ESRI Shapefile (essential - .shp, .shx, .dbf, optional - .prj, .sbx, .sbn)

geo-referenced TIFF (.tif, .tfw)

CAD data (.dwg)

tabular GIS attribute data

ESRI Geodatabase format (.mdb)

MapInfo Interchange Format (.mif) for vector data

Keyhole Mark-up Language (KML) (.kml)

Adobe Illustrator (.ai), CAD data (.dxf or .svg)

binary formats of GIS and CAD packages

Qualitative data

textual

eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml)

Rich Text Format (.rtf)

plain text data, ASCII (.txt)

Hypertext Mark-up Language (HTML) (.html)

widely-used proprietary formats, e.g. MS Word (.doc/.docx)

some proprietary/software-specific formats, e.g. NUD*IST, NVivo and ATLAS.ti

Documentation and scripts

Rich Text Format (.rtf)
PDF/A or PDF (.pdf)
HTML (.htm)
OpenDocument Text (.odt)

plain text (.txt)

some widely-used proprietary formats, e.g. MS Word (.doc/.docx) or MS Excel (.xls/.xlsx)

XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0

Digital image data

TIFF version 6 uncompressed (.tif)

JPEG (.jpeg, .jpg) but only if created in this format

TIFF (other versions) (.tif, .tiff)

Adobe Portable Document Format (PDF/A, PDF) (.pdf)

standard applicable RAW image format (.raw)

Photoshop files (.psd)

Digital audio data

Free Lossless Audio Codec (FLAC) (.flac)

MPEG-1 Audio Layer 3 (.mp3) but only if created in this format

Audio Interchange File Format (AIFF) (.aif)

Waveform Audio Format (WAV) (.wav)

Digital video data

MPEG-4 (.mp4)

motion JPEG 2000 (.mj2)