Skip to content

Modules Overview

Omelette is organized into eight core modules that form a research literature pipeline.

#ModuleDescription
1KeywordsThree-level keyword hierarchy, LLM expansion, search formula generation
2Literature SearchFederated search across Semantic Scholar, OpenAlex, arXiv, Crossref
3DeduplicationDOI hard dedup, title similarity, LLM-verified dedup
4SubscriptionRSS feeds and API-based scheduled updates
5PDF CrawlerUnpaywall, arXiv, direct URL fallback
6OCRpdfplumber + PaddleOCR for scanned PDFs
7RAG Knowledge BaseChromaDB vectors, hybrid retrieval, LLM answers with citations
8Writing AssistantSummarization, citations (GB/T 7714, APA, MLA), review outlines, gap analysis

Pipeline Flow

Keywords → Search → Dedup → Subscription → Crawler → OCR → RAG → Writing

Each module can be used independently or as part of the full pipeline. Projects organize literature; keywords drive search; results flow through dedup, crawl, OCR, and indexing before being queried for writing assistance.

Released under the MIT License.