Open thread
exploringCan scientific literature be made genuinely navigable — not just searchable?
A ranked list answers one question: which documents match these keywords. It says nothing about how a field is shaped, which results build on which, where two communities are quietly disagreeing, or which corner nobody has looked at yet. The work at SciX has been about putting that structure on top of the ADS corpus, using embeddings for meaning, citation graphs for how the ideas connect, and controlled vocabularies to keep the grounding honest. What sits below is the current state of that, and the papers I read to push on it.
The work
- SciX Agent Project
- Literature Explorers Project
- Code Intelligence Digest Project
- Experimenting with Large Language Models and vector embeddings in NASA SciX Paper (mine)
- Making Scientific Knowledge Navigable for Agents Talk
Reading path
A generated path through 5 papers — assembled using her SciX literature tools (semantic embeddings, citation graphs, and reading-order signals).
-
SPECTER: Document-level Representation Learning using Citation-informed Transformers ↗
Start with citation-informed document embeddings: how papers cite each other turns out to teach a model what they mean.
-
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks ↗
Then the case for domain adaptation: generic models leave signal on the table in specialized corpora.
-
Building astroBERT, a Language Model for Astronomy & Astrophysics ↗
astroBERT applies that to astronomy, a domain model trained on the same ADS corpus I work in.
-
Experimenting with Large Language Models and vector embeddings in NASA SciX ↗
Our SciX experiments: embeddings + vector search over the live literature, and what broke.
-
Knowledge Graphs ↗
Close on knowledge graphs, the structural layer ranked retrieval can never give you.
Open literature
From my ADS library
- Improving Text Embeddings with Large Language Models ↗
- ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models ↗
- A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery ↗
- The citation advantage of linking publications to research data ↗
- AstroLLaMA: Towards Specialized Foundation Models in Astronomy ↗
- Building astroBERT, a Language Model for Astronomy & Astrophysics ↗
Pulled from her curated ADS library.