## AstroKGC  
## This project focuses on constructing a knowledge graph using large language models (LLMs) for astronomical literature research.  
## Contributor: Jack Wang, South China Normal University  

---

### 1. Database: AstroKGC_test  
> **Directory**: `data/AstroKGC_test`  
> **Data Source**:  
> - Paper_Triple: *Acta Astronomica Sinica* 2023-61-1 to 2023-61-6  
> - Template_Triple: *Acta Astronomica Sinica* 2024-65-1 to 2024-65-2  

> **Extended Dataset (Large-Scale Spiral Galaxy Knowledge Graph)**:  
> - **Scope**: Contains ~300,000 semantically rich triples extracted from 18,341 spiral galaxy research papers.  
> - **Content**: Triples cover key astrophysical domains including galactic dynamics, dark matter halos, AGN feedback mechanisms, arm morphology, star-forming regions, and black hole dynamics.  
> - **Metadata Integration**: Each triple is enriched with full paper metadata:  
>   - `Authors`, `Author Full Names`, `Article Title`, `Source Title`  
>   - `Affiliations`, `Author Keywords`, `Keywords Plus`, `Abstract`  
>   - `Addresses`, `Cited Reference Count`, `ISSN`, `eISSN`  
>   - `Publication Year`, `DOI`  
> - **Format**: Structured in JSON-LD schema for longitudinal research tracking and knowledge provenance analysis.  
> - **Base Model**: Extracted using **DeepSeek-R1:671B** via the AstroKGC framework, validated for accuracy and scalability.  

---

### 2. Environment  
> In the experiments, we utilized five large language models (LLMs).  
> - Three cloud-based models require users to prepare their own OpenAI API keys and endpoint URLs in advance.  
> - For locally deployed large models, we recommend using the "Ollama" framework for setup and deployment.  
> - The provided code demonstrates integration with cloud-based models. To adapt the implementation for locally deployed models, only minor adjustments to the response generation section of the scripts are required.  

---

### 3. How to Run?  
> **AstroKGC Extraction**:  
> ```bash  
> python Extract.py  
> ```  
> **Large-Scale KG Construction**:  
> - The spiral galaxy KG was built semi-automatically using DeepSeek-R1:671B with AstroKGC, incorporating iterative validation for data integrity.  
> - Raw data, TAR outputs, and final KG are archived in the [China-VO repository](链接1).  

---

### 4. Additional Notes  
> - The extended dataset enables multi-dimensional analysis (e.g., correlation between dark matter modeling techniques and observational advances).  
> - All datasets and code are publicly available in the [China-VO paperdata repository](链接1).