Nobel Prize Interdisciplinary Knowledge Graph¶
Unveiling a century of cross-disciplinary inspiration in science through Knowledge Graphs
Overview¶
This project builds a Cross-disciplinary Technical Concept Knowledge Graph anchored on Nobel Prize laureates and their core publications. It visualizes the key developmental pathways of natural sciences over the past 100 years, with a focus on revealing cross-disciplinary inspiration and technology transfer trajectories between different fields.
Core Outputs¶
An interactive, queryable knowledge graph where:
- Nodes = Key technical concepts (e.g., SGD, Transformer, X-ray Crystallography, PCR, etc.)
- Edges = Evolutionary relationships between concepts (derivation, application migration, fusion)
- Each concept node is associated with the corresponding Nobel laureate, representative papers, discipline, and temporal coordinates
Key Features¶
| Feature | Description |
|---|---|
| Multi-source Data Integration | Combines 757 laureates, 245K+ papers, 9M+ citation records |
| LLM-powered Concept Extraction | Uses GPT-4o to extract structured technical concepts from papers |
| Cross-disciplinary Detection | Automatically identifies concept migration across scientific fields |
| Interactive Visualization | Timeline, network graph, and heatmap visualizations |
| Insight Reports | Auto-generated analysis of hub concepts and field interactions |
Quick Start¶
# Install dependencies
uv add polars pandas pyarrow networkx pyvis plotly requests openai pyyaml python-dotenv tqdm scikit-learn
# Run the full pipeline
uv run python main.py
# Or run specific phases
uv run python main.py --phase 1 # Data loading only
uv run python main.py --skip-llm # Skip LLM extraction
See Getting Started for detailed setup instructions.
Architecture at a Glance¶
flowchart LR
A[Raw Data<br>CSV/JSON] --> B[Phase 1<br>Data Loading]
B --> C[Phase 1.5<br>Content Enrichment]
C --> D[Phase 2<br>Concept Extraction]
D --> E[Phase 3+4<br>Graph Construction]
E --> F[Phase 5<br>Visualization]
E --> G[Phase 6<br>Insight Analysis]
style A fill:#f9f,stroke:#333
style E fill:#bbf,stroke:#333
style F fill:#bfb,stroke:#333
style G fill:#bfb,stroke:#333
Project Structure¶
nobel/
├── main.py # Pipeline entry point
├── config/
│ └── settings.yaml # Project configuration
├── src/
│ ├── data_loader.py # Phase 1: Data loading & cleaning
│ ├── content_enricher.py # Phase 1.5: Content enrichment
│ ├── openalex_enricher.py# Phase 2a: OpenAlex concept enrichment
│ ├── concept_extractor.py# Phase 2b: LLM concept extraction
│ ├── graph_builder.py # Phase 3+4: Knowledge graph construction
│ ├── visualize.py # Phase 5: Interactive visualization
│ └── insight_analyzer.py # Phase 6: Insight analysis & reporting
├── data/ # Source data (CSV + JSON)
├── output/ # Generated outputs
│ ├── clean_data/ # Cleaned Parquet files
│ ├── concepts/ # Extracted concepts
│ ├── graph/ # Knowledge graph (JSON/GraphML)
│ ├── viz/ # HTML visualizations
│ └── reports/ # Analysis reports
└── docs/ # This documentation
Documentation Map¶
| Section | Description |
|---|---|
| Getting Started | Installation, prerequisites, and first run |
| Architecture | System design and data flow |
| Configuration | Detailed configuration reference |
| Pipeline | Step-by-step pipeline documentation |
| API Reference | Module-level function documentation |
| Data | Data sources and knowledge graph schema |
| Contributing | How to contribute to this project |
License¶
This project is for research and educational purposes.