Knowledge Graph Schema
Overview
The knowledge graph uses a directed graph model with 5 node types and 9 edge types, designed to capture the relationships between Nobel laureates, their publications, technical concepts, and cross-disciplinary inspiration.
Node Types
Laureate
Represents a Nobel Prize laureate.
┌─────────────────┐
│ Laureate │
│─────────────────│
│ id: string │ "laureate_779"
│ type: "Laureate" │
│ name: string │ "Aaron Ciechanover"
│ nationality: str │ "Israeli"
│ birth_year: int │ 1947
│ gender: string │ "male"
└─────────────────┘
Award
Represents a specific Nobel Prize award event.
┌─────────────────┐
│ Award │
│─────────────────│
│ id: string │ "award_2004_3_779"
│ type: "Award" │
│ year: int │ 2004
│ category: string │ "Chemistry"
│ motivation: str │ "for the discovery of ubiquitin-mediated..."
│ prize_amount: int│
└─────────────────┘
Work
Represents a scientific publication.
┌──────────────────┐
│ Work │
│──────────────────│
│ id: string │ "W2078536640"
│ type: "Work" │
│ title: string │ "The ubiquitin-proteasome..."
│ year: int │ 1998
│ abstract: string │
│ keywords: string │
│ citation_count: int│ 1250
│ doi: string │
└──────────────────┘
Concept
Represents a technical or scientific concept extracted from publications.
┌────────────────────┐
│ Concept │
│────────────────────│
│ id: string │ "concept_ubiquitin_proteasome_pathway"
│ type: "Concept" │
│ name: string │ "Ubiquitin-Proteasome Pathway"
│ field: string │ "Biology"
│ subfield: string │ "Molecular Biology"
│ confidence: float │ 0.95
│ first_appeared: int │ 1980
└────────────────────┘
Field
Represents a scientific discipline or domain.
┌────────────────────┐
│ Field │
│────────────────────│
│ id: string │ "field_biology"
│ type: "Field" │
│ name: string │ "Biology"
│ parent_field: str │ null
└────────────────────┘
Edge Types
Relationship Map
graph LR
L[Laureate] -->|WON_AWARD| A[Award]
L -->|AUTHORED| W[Work]
W -->|CITES| W2[Work]
W -->|INTRODUCES| C[Concept]
W -->|APPLIES| C
C -->|BELONGS_TO| F[Field]
C -->|DERIVED_FROM| C2[Concept]
C -->|CROSS_INSPIRED| C3[Concept]
C -->|ENABLED| C4[Concept]
A -->|AWARDED_FOR| C
Edge Definitions
WON_AWARD
| Property |
Value |
| Source |
Laureate |
| Target |
Award |
| Attributes |
year, portion |
| Semantics |
Laureate received this Nobel Prize |
AUTHORED
| Property |
Value |
| Source |
Laureate |
| Target |
Work |
| Attributes |
position |
| Semantics |
Laureate authored this paper |
CITES
| Property |
Value |
| Source |
Work |
| Target |
Work |
| Attributes |
— |
| Semantics |
Paper A cites Paper B |
INTRODUCES
| Property |
Value |
| Source |
Work |
| Target |
Concept |
| Attributes |
confidence |
| Semantics |
Paper first proposed or introduced this concept |
APPLIES
| Property |
Value |
| Source |
Work |
| Target |
Concept |
| Attributes |
confidence |
| Semantics |
Paper applied or utilized this concept |
BELONGS_TO
| Property |
Value |
| Source |
Concept |
| Target |
Field |
| Attributes |
— |
| Semantics |
Concept belongs to this scientific field |
DERIVED_FROM
| Property |
Value |
| Source |
Concept |
| Target |
Concept |
| Attributes |
year, description |
| Semantics |
Concept evolved from another within the same field |
CROSS_INSPIRED ⭐
| Property |
Value |
| Source |
Concept |
| Target |
Concept |
| Attributes |
year, source_field, target_field, description |
| Semantics |
Cross-disciplinary migration — a concept from one field inspired a concept in another |
This is the core edge type of the knowledge graph. Examples:
| Source |
Target |
Migration |
| Optimization Theory |
Stochastic Gradient Descent |
Math → AI (~1960s) |
| Transformer |
AlphaFold |
AI → Structural Biology (2018) |
| X-ray Diffraction |
DNA Double Helix |
Physics → Molecular Biology (1953) |
| Statistical Mechanics |
Boltzmann Machine |
Physics → Machine Learning (1985) |
| Quantum Mechanics |
Quantum Chemistry |
Physics → Chemistry (1930s) |
ENABLED
| Property |
Value |
| Source |
Concept |
| Target |
Concept |
| Attributes |
description |
| Semantics |
One concept enabled or made possible another |
AWARDED_FOR
| Property |
Value |
| Source |
Award |
| Target |
Concept |
| Attributes |
— |
| Semantics |
Nobel Prize was awarded for work on this concept |
JSON Serialization
{
"nodes": [
{
"id": "laureate_779",
"type": "Laureate",
"name": "Aaron Ciechanover",
"nationality": "Israeli",
"birth_year": 1947,
"gender": "male"
},
{
"id": "concept_ubiquitin",
"type": "Concept",
"name": "Ubiquitin-Proteasome Pathway",
"field": "Biology",
"subfield": "Molecular Biology"
}
],
"edges": [
{
"source": "laureate_779",
"target": "award_2004_3_779",
"type": "WON_AWARD",
"year": 2004
},
{
"source": "concept_a",
"target": "concept_b",
"type": "CROSS_INSPIRED",
"year": 2001,
"source_field": "Physics",
"target_field": "Biology",
"description": "Spectroscopy techniques applied to protein analysis"
}
]
}
GraphML Export
The graph is also exported as GraphML (knowledge_graph.graphml) compatible with:
- Gephi — Open-source graph visualization
- Cytoscape — Network analysis platform
- yEd — Graph editor
- NetworkX — Python graph library
Concept Graph Schema
The Concept Graph is a simplified representation of the knowledge graph, focusing on concepts and their relationships. It is designed to highlight the flow of ideas and their connections across disciplines.
Schema Details
- Nodes:
id: Unique identifier for the concept.
name: Human-readable name of the concept.
paper_count: Number of papers associated with the concept.
total_citations: Total citations received by papers linked to the concept.
- Edges:
source: Source concept ID.
target: Target concept ID.
type: Relationship type (e.g., CONCEPT_CITES).
total_citations: Total citations between the connected concepts.
Construction Process
- Extract concepts from papers.
- Deduplicate concepts across papers.
- Establish relationships based on citations and shared concepts.
- Export the graph in JSON and GraphML formats.
Example JSON Structure
{
"nodes": [
{
"id": "concept_1",
"name": "Quantum Mechanics",
"paper_count": 120,
"total_citations": 4500
}
],
"edges": [
{
"source": "concept_1",
"target": "concept_2",
"type": "CONCEPT_CITES",
"total_citations": 300
}
]
}