Defining Metadata
What is Metadata?
Metadata is a combination of keywords, descriptions and other pertinent information about data that allows for discovery, understanding and use. It is the way researchers and organizations document relevant information about the creation, content, quality and editing process of data (sets), among other characteristics. Simply put, it is data about data. It can either be found within data files, or as a separate text file.
Think of an online library database. These search engines allow us to look for articles or books by their titles, keywords, specific words within their titles or content, authors, type of document, publication date, and language. We can even combine some of these parameters to acquire specific results. An example of this would be: Peer reviewed articles about hurricanes published after 2010. All of this information, and more, is linkedto the document itself, so that when we search for specific terms, we find all documents that meet the criteria. The information is considered metadata, and without it, it is nearly impossible to access the documents of interest.
The same concept applies when we create an online profile, be it on a social media portal or for registration on a website. We typically provide some basic information about ourselves: name and/or username, date of birth, email, address, phone number. This is information that the platform creators consider pertinent to have about each user, or that will be displayed in our personal page so others get to know us. This is our personal metadata, the way we introduce ourselves to the cyber-world.
When we fill out a metadata form, we are answering relevant questions about our data, so that other users understand its contents and characteristics, and are able to find it. This process of gathering data, sometimes referred to as “interviewing or being interviewed” about data should start by asking/answering the following questions:
- Data content and structure
- What do the data represent (measurements/observations)
- How are they structured (i.e. format)
- What are the accuracy and precision of the data
- Context
- Where, when and how were the data collected
- What processing steps were followed in the production of the data
- Who was responsible for data collection and how may they be contacted
- What is the research domain for which the data were collected
- For what purpose were the data collected/produced
- Are the data part of a larger collection (e.g. an ongoing series of measurements or a set of otherwise related data products)
- Related Standards, Practices and Resources
- Are there existing archives that specialize in the preservation of the data
- Are there existing documentation standards that are commonly used by those archives or the research domain to which the data pertain
- What data citation standards are practiced within the specific research domain – what information is required to support those data citation standards
- How do the data products align with the norms, protocols or standards for integration into the target archive(s)
Depending on the metadata standard used, and the requirements of data documentation of your organization, some questions might be eliminated and others added. We introduce some of the existing metadata standards bellow.
Why Is It Important?
Metadata is essential to help other users find the data in archives, to understand the data’s contents and its creation process, to keep track of any updates and applications, and to allow future reproducibility of the data. We cannot stress enough the importance of documenting relevant information about the data that is being created or edited. Metadata is the biography and the history of data. Without it, we cannot assess with confidence its validity and accuracy.
As originators of data, we should know the answers to the questions mentioned above, but would we be able to answer such questions for data created by someone else? We might be able to figure out what type of information is being represented, and other basic details that can be determined by observing the content. On the other hand, without metadata we would not know the purpose for which the data was created, contact information about the person and/or organization responsible for the data, quality standards under which the data was created, among other key facts.
Many researchers raise a red flag when acquiring data with no metadata because using such data for research can put in question the validity of the research project itself. Some use the saying “garbage in, garbage out” to refer to the use of inaccurate or error-riddled data in analysis and research, because it automatically damages their results. We would not go as far as to calling data without metadata garbage, but the lack of metadata does affect the way the data is perceived, read and processed.
How can we know if the data comes from a reliable source? Was the format changed at some point? What equipment was used to collect the data, and at what resolution? Who can we contact to ask these questions and how can we contact them?
If metadata is documented properly, we can have immediate access to the information that would answer all of these questions and more. After going through the time and energy-consuming endeavor that can be creating data, including its metadata will ensure other users will get the most out of it.
About Metadata Standards
as metadata standards. These are the different types of templates available to for us to input information about data. Depending on the type of data and the requirements of the organization producing it (or the entity for which it is produced), metadata can be represented using standards that include FGDC CSDGM (Federal Geographic Data Committee’s Content Standard for Digital Geospatial Metadata), ISO19115/19115-2/19139 (Standard for Digital Geospatial Metadata), and Dublin Core.
FGDC CSDGM is the US Federal geospatial metadata standard, although its been substituted in the development of new metadata by ISO standard versions such as ISO 19115-2, given its capacity to capture more detailed information. Dublin Core records general data documentation and it is widely applied in library environments.
These standards pertain to the description of geospatial data for the most part. On the other hand, any type of data, geospatial or not, should be documented.