Nobel Prize Interdisciplinary Knowledge Graph¶

Unveiling a century of cross-disciplinary inspiration in science through Knowledge Graphs

Overview¶

This project builds a Cross-disciplinary Technical Concept Knowledge Graph anchored on Nobel Prize laureates and their core publications. It visualizes the key developmental pathways of natural sciences over the past 100 years, with a focus on revealing cross-disciplinary inspiration and technology transfer trajectories between different fields.

Core Outputs¶

An interactive, queryable knowledge graph where:

Nodes = Key technical concepts (e.g., SGD, Transformer, X-ray Crystallography, PCR, etc.)
Edges = Evolutionary relationships between concepts (derivation, application migration, fusion)
Each concept node is associated with the corresponding Nobel laureate, representative papers, discipline, and temporal coordinates

Key Features¶

Feature	Description
Multi-source Data Integration	Combines 757 laureates, 245K+ papers, 9M+ citation records
LLM-powered Concept Extraction	Uses GPT-4o to extract structured technical concepts from papers
Cross-disciplinary Detection	Automatically identifies concept migration across scientific fields
Interactive Visualization	Timeline, network graph, and heatmap visualizations
Insight Reports	Auto-generated analysis of hub concepts and field interactions

Quick Start¶

# Install dependencies
uv add polars pandas pyarrow networkx pyvis plotly requests openai pyyaml python-dotenv tqdm scikit-learn

# Run the full pipeline
uv run python main.py

# Or run specific phases
uv run python main.py --phase 1      # Data loading only
uv run python main.py --skip-llm     # Skip LLM extraction

See Getting Started for detailed setup instructions.

Architecture at a Glance¶

flowchart LR
    A[Raw Data<br>CSV/JSON] --> B[Phase 1<br>Data Loading]
    B --> C[Phase 1.5<br>Content Enrichment]
    C --> D[Phase 2<br>Concept Extraction]
    D --> E[Phase 3+4<br>Graph Construction]
    E --> F[Phase 5<br>Visualization]
    E --> G[Phase 6<br>Insight Analysis]

    style A fill:#f9f,stroke:#333
    style E fill:#bbf,stroke:#333
    style F fill:#bfb,stroke:#333
    style G fill:#bfb,stroke:#333

Project Structure¶

nobel/
├── main.py                 # Pipeline entry point
├── config/
│   └── settings.yaml       # Project configuration
├── src/
│   ├── data_loader.py      # Phase 1: Data loading & cleaning
│   ├── content_enricher.py # Phase 1.5: Content enrichment
│   ├── openalex_enricher.py# Phase 2a: OpenAlex concept enrichment
│   ├── concept_extractor.py# Phase 2b: LLM concept extraction
│   ├── graph_builder.py    # Phase 3+4: Knowledge graph construction
│   ├── visualize.py        # Phase 5: Interactive visualization
│   └── insight_analyzer.py # Phase 6: Insight analysis & reporting
├── data/                   # Source data (CSV + JSON)
├── output/                 # Generated outputs
│   ├── clean_data/         # Cleaned Parquet files
│   ├── concepts/           # Extracted concepts
│   ├── graph/              # Knowledge graph (JSON/GraphML)
│   ├── viz/                # HTML visualizations
│   └── reports/            # Analysis reports
└── docs/                   # This documentation

Documentation Map¶

Section	Description
Getting Started	Installation, prerequisites, and first run
Architecture	System design and data flow
Configuration	Detailed configuration reference
Pipeline	Step-by-step pipeline documentation
API Reference	Module-level function documentation
Data	Data sources and knowledge graph schema
Contributing	How to contribute to this project

License¶

This project is for research and educational purposes.