Skip to content

Knowledge Graph Schema

Overview

The knowledge graph uses a directed graph model with 5 node types and 9 edge types, designed to capture the relationships between Nobel laureates, their publications, technical concepts, and cross-disciplinary inspiration.

Node Types

Laureate

Represents a Nobel Prize laureate.

┌─────────────────┐
│    Laureate      │
│─────────────────│
│ id: string       │  "laureate_779"
│ type: "Laureate" │
│ name: string     │  "Aaron Ciechanover"
│ nationality: str │  "Israeli"
│ birth_year: int  │  1947
│ gender: string   │  "male"
└─────────────────┘

Award

Represents a specific Nobel Prize award event.

┌─────────────────┐
│      Award       │
│─────────────────│
│ id: string       │  "award_2004_3_779"
│ type: "Award"    │
│ year: int        │  2004
│ category: string │  "Chemistry"
│ motivation: str  │  "for the discovery of ubiquitin-mediated..."
│ prize_amount: int│
└─────────────────┘

Work

Represents a scientific publication.

┌──────────────────┐
│      Work         │
│──────────────────│
│ id: string        │  "W2078536640"
│ type: "Work"      │
│ title: string     │  "The ubiquitin-proteasome..."
│ year: int         │  1998
│ abstract: string  │
│ keywords: string  │
│ citation_count: int│  1250
│ doi: string       │
└──────────────────┘

Concept

Represents a technical or scientific concept extracted from publications.

┌────────────────────┐
│     Concept         │
│────────────────────│
│ id: string          │  "concept_ubiquitin_proteasome_pathway"
│ type: "Concept"     │
│ name: string        │  "Ubiquitin-Proteasome Pathway"
│ field: string       │  "Biology"
│ subfield: string    │  "Molecular Biology"
│ confidence: float   │  0.95
│ first_appeared: int │  1980
└────────────────────┘

Field

Represents a scientific discipline or domain.

┌────────────────────┐
│     Field           │
│────────────────────│
│ id: string          │  "field_biology"
│ type: "Field"       │
│ name: string        │  "Biology"
│ parent_field: str   │  null
└────────────────────┘

Edge Types

Relationship Map

graph LR
    L[Laureate] -->|WON_AWARD| A[Award]
    L -->|AUTHORED| W[Work]
    W -->|CITES| W2[Work]
    W -->|INTRODUCES| C[Concept]
    W -->|APPLIES| C
    C -->|BELONGS_TO| F[Field]
    C -->|DERIVED_FROM| C2[Concept]
    C -->|CROSS_INSPIRED| C3[Concept]
    C -->|ENABLED| C4[Concept]
    A -->|AWARDED_FOR| C

Edge Definitions

WON_AWARD

Property Value
Source Laureate
Target Award
Attributes year, portion
Semantics Laureate received this Nobel Prize

AUTHORED

Property Value
Source Laureate
Target Work
Attributes position
Semantics Laureate authored this paper

CITES

Property Value
Source Work
Target Work
Attributes
Semantics Paper A cites Paper B

INTRODUCES

Property Value
Source Work
Target Concept
Attributes confidence
Semantics Paper first proposed or introduced this concept

APPLIES

Property Value
Source Work
Target Concept
Attributes confidence
Semantics Paper applied or utilized this concept

BELONGS_TO

Property Value
Source Concept
Target Field
Attributes
Semantics Concept belongs to this scientific field

DERIVED_FROM

Property Value
Source Concept
Target Concept
Attributes year, description
Semantics Concept evolved from another within the same field

CROSS_INSPIRED

Property Value
Source Concept
Target Concept
Attributes year, source_field, target_field, description
Semantics Cross-disciplinary migration — a concept from one field inspired a concept in another

This is the core edge type of the knowledge graph. Examples:

Source Target Migration
Optimization Theory Stochastic Gradient Descent Math → AI (~1960s)
Transformer AlphaFold AI → Structural Biology (2018)
X-ray Diffraction DNA Double Helix Physics → Molecular Biology (1953)
Statistical Mechanics Boltzmann Machine Physics → Machine Learning (1985)
Quantum Mechanics Quantum Chemistry Physics → Chemistry (1930s)

ENABLED

Property Value
Source Concept
Target Concept
Attributes description
Semantics One concept enabled or made possible another

AWARDED_FOR

Property Value
Source Award
Target Concept
Attributes
Semantics Nobel Prize was awarded for work on this concept

JSON Serialization

Full Graph Format

{
  "nodes": [
    {
      "id": "laureate_779",
      "type": "Laureate",
      "name": "Aaron Ciechanover",
      "nationality": "Israeli",
      "birth_year": 1947,
      "gender": "male"
    },
    {
      "id": "concept_ubiquitin",
      "type": "Concept",
      "name": "Ubiquitin-Proteasome Pathway",
      "field": "Biology",
      "subfield": "Molecular Biology"
    }
  ],
  "edges": [
    {
      "source": "laureate_779",
      "target": "award_2004_3_779",
      "type": "WON_AWARD",
      "year": 2004
    },
    {
      "source": "concept_a",
      "target": "concept_b",
      "type": "CROSS_INSPIRED",
      "year": 2001,
      "source_field": "Physics",
      "target_field": "Biology",
      "description": "Spectroscopy techniques applied to protein analysis"
    }
  ]
}

GraphML Export

The graph is also exported as GraphML (knowledge_graph.graphml) compatible with:

  • Gephi — Open-source graph visualization
  • Cytoscape — Network analysis platform
  • yEd — Graph editor
  • NetworkX — Python graph library

Concept Graph Schema

The Concept Graph is a simplified representation of the knowledge graph, focusing on concepts and their relationships. It is designed to highlight the flow of ideas and their connections across disciplines.

Schema Details

  • Nodes:
  • id: Unique identifier for the concept.
  • name: Human-readable name of the concept.
  • paper_count: Number of papers associated with the concept.
  • total_citations: Total citations received by papers linked to the concept.
  • Edges:
  • source: Source concept ID.
  • target: Target concept ID.
  • type: Relationship type (e.g., CONCEPT_CITES).
  • total_citations: Total citations between the connected concepts.

Construction Process

  1. Extract concepts from papers.
  2. Deduplicate concepts across papers.
  3. Establish relationships based on citations and shared concepts.
  4. Export the graph in JSON and GraphML formats.

Example JSON Structure

{
  "nodes": [
    {
      "id": "concept_1",
      "name": "Quantum Mechanics",
      "paper_count": 120,
      "total_citations": 4500
    }
  ],
  "edges": [
    {
      "source": "concept_1",
      "target": "concept_2",
      "type": "CONCEPT_CITES",
      "total_citations": 300
    }
  ]
}