Metadata: Data about Data?
What is it?
Metadata has been defined as data about data, a “love note to the future” (Scott, Jason. 2011. “The Metadata Mania.” ASCII. http://ascii.textfiles.com/archives/3181.). It is a way to give context to the data so that it can be understood and reused at a later point in time.
Types of Metadata
Descriptive Metadata
Defines what the data is about, for example: the dataset’s title, a summary or abstract, keywords, and subject categories.
Structural Metadata
Describes how the data is organized and relates internally, for example: file formats, folder hierarchies, database schemas, and collection groupings.
Administrative Metadata
Records who created or modified the data and how it has been managed, for example: version history, ownership or licensing information, and access controls.
Quality Metadata
Indicates how reliable or fit for purpose the data is, for example: quality scores or ranks, validation checks performed, and known limitations.
Metadata unlocks the messages within the data.
Why Bother?

Taking care to curate your data with rich metadata at the point of collection—and storing it in a well-organized, annotated form—yields benefits across the research lifecycle:
Findability: Detailed metadata (titles, keywords, controlled-vocabulary terms) makes your data easy to discover; whether you’re searching your own archives or colleagues are hunting in public repositories.
Machine-Actionability: Standardized, ontology-driven annotations let software tools automatically interpret, validate, and combine datasets at scale (e.g., in high-throughput or AI-driven workflows).
Reproducibility & Reuse: You (and others) can revisit the dataset later to replicate analyses, build upon prior results, or integrate it into new studies.
Credit & Attribution: Metadata fields for authorship, licensing, and version history ensure you receive proper acknowledgement when others use or cite your data—and clarify permissible reuse.
Moreover, funding agencies are increasingly mandating robust research data management (RDM) plans and metadata annotation as conditions for grant support—so good metadata isn’t just best practice, it’s quickly becoming a requirement for securing funding.
Metadata Standards
Metadata standards are community-agreed guidelines that define which metadata fields to include, how those fields should be formatted, and which controlled vocabularies or ontologies to use. Each standard typically provides, for every metadata element:
Field description: what the metadata field represents and whether it’s required or optional
Cardinality: how many values are allowed or expected (e.g., single value vs. list)
Persistent identifier: a stable, unique reference for the field itself
By adhering to a relevant metadata standard, you ensure your dataset is consistent, interoperable, and machine-readable. To discover standards suited to your type of data, visit this link.
Data Repositories
Data repositories are essential in making data and metadata readily available to the public. There are both national and international attempts to establish and maintain repositories that facilitate FAIR data standards around the world. For example, International Nucleotide Sequence Database Collaboration (INSDC) integrate and mirrors three data repositories around the world:
- There is the NCBI in the USA
- European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in Europe
- DNA Databank of Japan (DDBJ) in Japan
To find a a suitable data repository for you data follow this link to use the data submission wizard tool.
Ontologies
An ontology is a structured vocabulary that captures knowledge in a domain to ensure consistent data annotation and interoperability. Key features include:
- Curated terms with clear definitions and synonyms
- Persistent identifiers (e.g., GO:0008150 for Gene Ontology “biological_process”)
- Hierarchical organization (e.g., “cell” is_a “biological_entity”)
- Defined relationships between terms (e.g., “has_part,” “regulates”)
- Cross-references to other resources or ontologies
Examples:
Gene Ontology (GO): describes gene product functions
Disease Ontology (DO): standardizes disease names and classifications
By reflecting current scientific knowledge, ontologies enable machines and researchers alike to “speak the same language.”
Look up ontology terms using. BioPortal or Ontology Lookup Service v4
