Underlying concepts of Wikidata

Last updated on 2026-04-17 | Edit this page

Overview

Questions

  • What is a RDF triple?
  • What are the underlying components of RDF?
  • What tools and services are built on top of Wikidata?

Objectives

  • Know what a triple is, and relate structure of a Wikidata statement to traditional metadata field structure.
  • Know how linked data can create more context for patrons/users in library catalogs.
  • Know how linked data can improve recall in library catalogs.
  • Be aware of tools and services built on top of Wikidata, such as Scholia and InteractOA.

2.1 Conceptual foundations: ways of storing data


There are many types of database structures and systems. Two common database types are relational databases and graph databases. Understanding the commonalities and differencews between these structures helps to explain the uniqueness of Wikidata’s data structure.

2.1.1 Relational databases

A relational database is a set of formally described related tables from which data can be accessed or reassembled. This model organizes data into one or more tables (or “relations”) of columns and rows, with a unique key identifying each row. each table/relation represents one “entity type” and these entities are connected via constrained relationships. This model is fully structured and mostly uses SQL (Structured Query Language) to retrive and manuplate data.

A single database table and its basic parts is demonstrated below. Note that each row is a set of ordered values that corresponds to a single data element. Each column in the table may be understood as an attribute, which is a common attribute, but for which each row has the data corresponding to that record. Together, the entire table consitutes a data element that can be related to other other tables.

Schematic of a data table in a relational database, which can be understood as a series of records (ordered tuple values) with various attributes, which can be related to other tables in the database through structured queries.

2.1.2 Graph / Semantic databases

Semantic web is an extension of the World Wide Web standards, which promote common formats and exchange protocols on the Web. For data exchange, the fundamental Web standard is the Resource Description Framework, or RDF. Rather than being defined by tables, this “graph” or semantic structure is defined by relationship statements. RDF outlines a protocol for encoding and transmitting graph data on the web.

RDF can be queried and analyzes using a language called SPARQL (Simple Protocol and RDF Query Language). This has its own syntax, but it is similar to how relational databases use SQL (Structured Query Language) to create and build queries. In SQL relational database terms, RDF data can also be processed as a table, but with only three columns – the subject column, the predicate column, and the object column.

A data structure diagram illustrating a possible connection between a list of triples, represented by a data dictionary, and a graph diagram which visualizes the relationships stipulated by the triples.

2.2 Conceptual foundations: RDF and Triples


The RDF defines a conceptual data model that is based on the idea of making statements about resources. Unlike a relational database, the data model defined by RDF is text-focused, and it is based on relating defined entities (as Wikidata calls them, items) that can be referred to by a Internationalized Resource Identifier (an IRI, which is nearly synonymous with a URL), and which can be connected or related to any other defined entity through a standard language. While the data structures can be complex, they rely on a basic structure called a triple, which consists of a subject and an object, which are linked together, or related, by a defined relationship called a predicate (as Wikidata calls it, a property). Here youcan read Wikipedia’s definition of a semantic triple.

Schematic illustration of an RDF Triple

The basic data statement is expressed in the form subject–predicate–object, also known as a triple. The subject denotes the resource. In Wikidata, each item, or Q node, is a triple subject. The object is usually another data entity, though it may also be a standalone value, which is related to the subject by the predicate relator. The predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object, for example:

  • The British Library is-a library
  • John is-a person
  • John born-in 1980
  • John has-occupation engineer

Each of the above is a triple about the subject “John,” wither different predicates and objects.

As you can imagine, Wikidata has a huge number of data items (subjects), and it includes millions and millions of triple statements. RDF data are stores are also known as triplestores.

2.3 Wikidata concepts


Items
Items represents things and conceps, including people, places, events, subjects, and more. Examples mentioned previously include the British Library or Douglas Adams. Wikidata items have identifiers that start with letter “Q”, like Q42 for Douglas Adams.
Each item must have a label in one or more languages, optionally have alternative names and descrition.
Properties
Properties represents attributes of the subject such occupation and have identifiers that starts with letter “P” like: P106 for Occupation.
Claims
Claims are the triple statements, which combine the formation of Item and Property and value. For example: Douglas Adams (Q42) - occupation (P106) - comedian (Q245068). Note: value can be already stored in wikidata, therefore the bot assigns the Q number of the value instead.
Statement
A Claim is a part of a statement, a statement also includes: References, Ranks, and Qualifiers.
References
Used to store the source of the claim, using properties, such stated in, qoute, and etc.
Ranks
A useful component to deprecate outdated claims.
Qualifiers
Qualifiers are basically properties but on claims rather than items.
Discussion

Can you identify triple structures in library data?

Is data stored in the RDF triple format part of your work as a librarian? Take some time to think about if data stored in the RDF triple format is part of your work as a librarian. Can you give an example in the format of an RDF triple?

Challenge

Point out one RDF triple on the Wikidata item page of former astronaut Mae Jemison.

Got to the Wikidata page of Mae Jemison and point out one RDF triple. An RDF triple consists of a subject, a predicate and an object. Can you assign the three corresponding Wikidata terms?

Go to Wikidata and either search for “Mae Jemison” or enter the ID Q34091. In the picture below the statement “Mae C. Jemison - part of - NASA Astronaut Group 12” is an RDF triple. Wikidata_Main_Page
Screenshot of Wikidata Main Page

2.4 Tools and services using Wikidata


Wikidata’s open data can be used and reused by anyone. A number of tools and services have been built on top of Wikidata’s database, demonstrating its value as a linked open data resource.

The Linked Open Data Cloud

Wikidata is part of a much larger ecosystem of linked open datasets. The Linked Open Data Cloud visualizes the connections between hundreds of open datasets on the web. Note that this visualization is updated regularly.

Visualization of the Linked Open Data Cloud showing connections between hundreds of open datasets

Scholia

Scholia is a tool built on top of Wikidata that visualizes scholarly profiles and research outputs. It has no own database but queries Wikidata directly, which means any addition to Wikidata is immediately reflected in Scholia. It is particularly useful for librarians working with research information.

Discussion

Explore a Wikidata item and its Scholia profile

Part 1: Wikidata item

Go to the Wikidata page of Donna Haraway (Q253407), a feminist scholar and philosopher of science.

  • What statements can you find?
  • What identifiers are listed?
  • What links to other Wiki projects do you see?

Part 2: Scholia profile

Now go to the Scholia profile of Alex Bateman (Q18921408), a bioinformatician with many publications in Wikidata.

  • How many publications are listed per year?
  • What topics does his research cover?
  • Who are his most frequent co-authors?
  • Can you find the academic tree and citation statistics?

InteractOA

InteractOA is another example of a web service built on top of Wikidata. It visualizes genomic RNA interaction networks based on data stored in Wikidata, linked directly to the open-access articles that describe the evidence for these interactions. It demonstrates that Wikidata can serve as a sustainable, open database for specialized scientific knowledge.

Key Points
  • Triples are the basic data structure of graph databases, and they are the conceptual structure of Wikidata statements.
  • Wikidata items are denoted by a human-readable label and a short description, and a unique identifier that begins with a Q. These items are the subjects of linked Wikidata statements.
  • Wikidata defines relationships between items, also known as triple predicates, with Wikidata properties.
  • Wikidata statements can capture library information, such as relationships like creatorship, publication, aboutness, and more.
  • Wikidata is part of a larger Linked Open Data ecosystem, and its data can be reused to build tools and services such as Scholia, InteractOA, and others.