There is a quiet room in a university library somewhere rows of terminals, the hum of servers, a screen displaying something that looks like a family tree drawn by a mathematician. Each node is a concept: a person, an institution, a historical event, a book. The edges between them are labeled with relationships "founded by," "influenced," "taught at," "published in." What looks like a diagram is actually a database. What looks like a database is actually a new way of thinking about knowledge itself.
This is the world of knowledge graphs, and it is reshaping how information is organized, shared, and discovered across education, research, and industry. At the center of one corner of this world sits a piece of open-source software called Wikibase a tool that has become, for thousands of organizations, the infrastructure behind collaborative knowledge building.
The working title of this piece mentioned a solo archivist. The sources gathered here tell a different story, and perhaps a more instructive one: the story of how open-source software, built by communities rather than corporations, became the backbone of the world's most ambitious collaborative knowledge projects.
What Is a Knowledge Graph and Why It Matters for Learning
To understand what Wikibase does, it helps to understand what a knowledge graph actually is. A traditional database stores information in tables rows and columns, like a spreadsheet. Each piece of data sits in its own cell, and relationships between pieces of data have to be inferred through complicated queries.
A knowledge graph takes a different approach. According to GeeksforGeeks's overview of open-source graph databases, these systems "store data in the form of nodes and edges. Nodes are vertices that store the data, while an edge represents the relationship between nodes." A family tree is a familiar example: the members are nodes, and the lines connecting them represent parent-child, sibling, or marriage relationships.
What makes this approach powerful is context. When LinkedIn shows you a first-degree or second-degree connection, that mapping happens because of graph structure. When Amazon recommends a product based on your browsing history, it is drawing on relationships encoded in a graph database. The same principle applies to academic research, museum catalogs, library collections, and learning resource databases.
For KnowledgePosts readers people researching practitioners, frameworks, books, and ideas this matters because knowledge graphs are increasingly how curated, structured information is stored and delivered. Understanding how they work is understanding the infrastructure behind the resources you rely on.
The Graph Database Landscape: Open Source in 2025
The open-source graph database ecosystem has grown significantly in recent years. According to GeeksforGeeks's survey of the field, these systems offer three major advantages over relational databases: high flexibility (schemas can change without affecting existing functions), high performance (even with complex transactions), and high efficiency (queries are shorter, and traversing relationships is a rapid process).
Neo4j, one of the oldest and most established open-source graph databases, has over 12,000 stars on GitHub and provides what GeeksforGeeks describes as "a native graph database" that "implements a graph model right to the storage level." ArangoDB, another popular option, has over 13,000 GitHub stars and is designed for scalability and fast performance.
But graph databases are the engine. Knowledge graphs are the purpose the specific application of graph technology to model real-world knowledge and the relationships between concepts. And for collaborative, open-knowledge applications, one project has become central: Wikibase.
Wikibase: The Open-Source Suite for Collaborative Knowledge Bases
Wikibase describes itself as "an open-source software suite for creating collaborative knowledge bases, opening the door to the Linked Open Data web." The project is not a single tool but a comprehensive ecosystem designed around several core principles.
First, there is ontology. "Knowledge is never one-dimensional," the Wikibase documentation notes, "and one ontology doesn't fit all. Different contexts require different ways of modeling the world; we build software to support that diversity." Rather than imposing a single hierarchical structure on knowledge, Wikibase allows each organization to define its own way of categorizing and connecting information while still enabling that information to be shared across contexts.
Second, there is federation. In a federated setup, according to Wikibase, "data stays where it is created but can still be accessed and referenced from elsewhere. Each project keeps its own structure and its own way of working. At the same time, it can link to others where it makes sense." This is a crucial concept for understanding collaborative knowledge building: individual institutions retain ownership and control of their data, but that data becomes part of a larger, interconnected web.
Third, there is linked open data reusable data, linked and understood across varying contexts. "Fewer silos, less duplication, and more collaboration," as Wikibase puts it. This is the vision that connects Wikibase to the broader movement for open knowledge: information that is not trapped in proprietary systems but freely available for reuse, connection, and discovery.
Who Uses Wikibase and What They Build With It
The Wikibase documentation identifies several key user communities, each with distinct needs that the software is designed to address.
Libraries frequently find Wikibase well suited to their requirements. The documentation notes that Wikibase "provides librarians with powerful collaboration tools with no need for prior in-depth knowledge of linked open data principles." This accessibility is significant: it means that knowledge organization expertise does not require deep technical background.
Research institutions benefit from what Wikibase calls "the greenfield conditions often needed by collections with large bodies of custom data in order to create an accurate representation of the unique information in their hands." For museums, archives, and research collections with unusual or specialized data, Wikibase offers flexibility that more rigid database systems cannot.
The scientific community represents perhaps the most obvious use case. "Huge amounts of raw data, multiple collaborators, every project's data structure wildly different and ripe for federation" this description from Wikibase captures why graph-based, federated knowledge management has become essential for modern research. Different labs, institutions, and research initiatives can maintain their own data while still contributing to and drawing from a shared knowledge ecosystem.
For KnowledgePosts readers, this has practical implications. If you have ever used a curated database of research papers, a museum's online collection, or a library's specialized catalog, there is a good chance you have interacted with data structured through tools like Wikibase even if you did not know it.
The Knowledge Graph Ecosystem: Beyond Wikibase
Wikibase is not the only significant open-source project in the knowledge graph space. The broader ecosystem includes tools for building, maintaining, and using knowledge graphs in various contexts.
For developers working at the intersection of knowledge graphs and artificial intelligence, Graphiti offers a different approach. Described as software to "Build Real-Time Knowledge Graphs for AI Agents," Graphiti has attracted significant attention from the developer community accumulating over 27,900 GitHub stars, nearly 2,800 forks, and 245 open issues indicating active development and community engagement. While Wikibase focuses on collaborative human knowledge management, Graphiti is designed to help AI systems maintain and update their understanding of information over time.
The distinction matters for understanding the broader knowledge graph landscape. Wikibase is optimized for human collaboration and long-term preservation of structured knowledge. Graphiti and similar projects are optimized for the dynamic, rapidly-changing knowledge requirements of AI systems. Both represent important directions in how knowledge graphs are being developed and applied.
For educators and knowledge workers, this ecosystem represents infrastructure that is increasingly relevant. As AI tools become more integrated into learning and research workflows, understanding how these systems represent and update knowledge becomes part of digital literacy.
Scale and Ambition: What the Largest Knowledge Graphs Look Like
While Wikibase focuses on federated, collaborative knowledge building, other projects have pursued knowledge graphs at massive scale. According to a 2020 report from The Batch, DiffBot a Stanford University offshoot founded in 2008 built a system that reads web code, parses text, classifies images, and assembles what it describes as the world's largest knowledge graph.
The scale is staggering. DiffBot's web crawler, according to the report, "rebuilds the graph every four to five days, adding roughly 150 million new subject-object-verb associations monthly." The knowledge graph "encompasses more than 10 billion entities people, businesses, products, locations, and so on and a trillion bits of information about those entities." The system has captured subject-verb-object associations "from 98 percent of the internet in nearly 50 languages."
Over 400 companies, including Adidas, Nasdaq, and SnapChat, have adopted DiffBot's technology. The company uses image recognition to classify content into 20 categories and analyzes text to find statements composed of a subject, verb, and object. A suite of machine learning techniques, including "knowledge fusion" (which weighs the trustworthiness of various sources), associates new information and overwrites outdated information.
"A knowledge graph that encompasses the entire internet could reveal a wealth of obscure connections between people, places, and things. This tool could also be useful for machine learning engineers who aim to train models that have a good grasp of facts."
The Batch noted that while knowledge graphs have proven powerful for companies such as Google and Microsoft, they "have received little attention in academia relative to their practical impact." Tools to automatically build large knowledge graphs, the publication suggested, "will help more teams reap their benefits."
Why This Matters for KnowledgePosts Readers
There is a practical dimension to understanding knowledge graphs that may not be immediately obvious. If you are researching practitioners, frameworks, books, or ideas if you are trying to understand how a particular learning resource, educational approach, or knowledge system works you are already navigating knowledge that has been structured, organized, and connected by someone.
Knowledge graphs are increasingly how that structuring happens. The databases behind academic search engines, the catalogs that allow you to find books across library systems, the structured collections that power learning management systems these are increasingly built on graph principles. Understanding the architecture of knowledge, even at a general level, helps you evaluate the resources you rely on and understand why some knowledge tools work better than others.
More specifically, the federated, open-source model that Wikibase represents offers lessons for anyone building or evaluating knowledge resources. The principles of linked open data keeping data in its home context while enabling connection and reuse offer an alternative to both monolithic proprietary systems and chaotic free-for-all approaches. The emphasis on ontology diversity accepting that different contexts require different ways of modeling the world offers humility about the limits of any single categorization system.
The Human Cost and the Human Promise
The original working title for this piece asked "what it cost him" a reference to an individual archivist. The sources do not document a single person who built the world's most-used open-source knowledge graph. What they document is something more instructive: a community, a set of principles, and an infrastructure that has enabled thousands of organizations to build collaborative knowledge bases.
The cost of building open knowledge infrastructure is real, but it is distributed. It appears in the hours that contributors to Wikibase spend maintaining and improving the software. It appears in the institutions that dedicate resources to building and maintaining their own knowledge bases. It appears in the developers who build on top of these systems and the users who navigate them.
The promise is equally distributed: knowledge that does not disappear when a company fails, that does not require expensive licenses to access, that can be connected across contexts without permission from a gatekeeper. For educators, researchers, and anyone who cares about the long-term preservation and accessibility of knowledge, this is not a small thing.
A Summary: Key Facts About the Open Knowledge Graph Landscape
| Project | Type | Key Characteristic | Community/Adoption |
|---|---|---|---|
| Wikibase | Open-source knowledge base software suite | Federated, linked open data, collaborative ontology building | GLAM organizations (libraries, museums, archives, research institutions) |
| Graphiti | Open-source tool for AI agents | Real-time knowledge graph construction for AI systems | 27,900+ GitHub stars, active developer community |
| Neo4j | Open-source graph database | Native graph implementation, ACID transactions, Cypher query language | 12,000+ GitHub stars, broad enterprise adoption |
| ArangoDB | Open-source graph database | Scalability, fast performance, integrated search engine | 13,000+ GitHub stars |
| DiffBot | Commercial knowledge graph (Stanford offshoot, founded 2008) | Automatic extraction from web, 10+ billion entities, 150M new associations monthly | 400+ company customers including Adidas, Nasdaq, SnapChat |
Where to Read Further
For readers who want to explore the open-source knowledge graph ecosystem in more depth, several starting points emerge from the sources:
- The official Wikibase site offers comprehensive documentation, case studies, and information about the software suite's various deployment options from cloud-hosted services for novices to self-installed configurations for experienced developers.
- The awesome-knowledge-graph repository on GitHub provides a curated list of learning materials, databases, tools, and other resources related to knowledge graphs a useful starting point for anyone building in this space.
- The Batch's report on DiffBot offers a detailed look at how one of the largest-scale knowledge graph projects approaches automatic knowledge extraction from the web.
The world of knowledge graphs is complex, technically demanding, and for those who care about how information is organized and shared increasingly important. The open-source projects and communities documented in these sources represent one vision of how that organization might work: collaborative, federated, and built on the principle that knowledge grows through connection.
Whether you are an educator evaluating learning resources, a researcher navigating academic databases, or simply someone who has ever searched for information and found something useful, you are touching infrastructure built on these principles. Understanding that infrastructure even in broad strokes is part of understanding the knowledge landscape you inhabit.



