mLLMCelltype is an iterative multi-LLM consensus framework for cell type annotation in single-cell RNA sequencing data. By leveraging the complementary strengths of multiple large language models, this framework significantly improves annotation accuracy while providing transparent uncertainty quantification.
The package implements a novel approach where multiple large language models (LLMs) collaborate through structured deliberation to achieve more accurate and reliable cell type annotations than any single model could provide alone.
Cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis. Traditional methods often rely on reference datasets or marker gene databases, which can be limited by the availability of high-quality references and the complexity of cell types across different tissues and conditions.
Large language models have shown promising results in cell type annotation by leveraging their extensive knowledge of biological literature and ability to reason about gene expression patterns. However, individual LLMs can produce hallucinations or make errors due to limitations in their training data or reasoning capabilities.
mLLMCelltype addresses these challenges by implementing a consensus-based approach where multiple LLMs collaborate to provide more reliable annotations.
mLLMCelltype harnesses collective intelligence from diverse LLMs to overcome single-model limitations and biases. The package currently supports a wide range of models:
By integrating multiple models with different architectures and training data, mLLMCelltype can achieve more robust and accurate annotations than any single model.
The package enables LLMs to share reasoning, evaluate evidence, and refine annotations through multiple rounds of collaborative discussion. This structured deliberation process includes:
This process mimics how a panel of human experts might collaborate to reach a consensus on difficult cases.
mLLMCelltype provides quantitative metrics to identify ambiguous cell populations that may require expert review:
These metrics help researchers identify which cell clusters have high confidence annotations and which may require further investigation.
mLLMCelltype is designed for a wide range of single-cell RNA sequencing analysis scenarios:
For a complete list of updates, please refer to the NEWS.md file.
To get started with mLLMCelltype, please refer to the Getting Started Guide and Usage Tutorial.
If you use mLLMCelltype in your research, please cite:
Yang, C., Zhang, X., & Chen, J. (2025). Large Language Model Consensus Substantially
Improves the Cell Type Annotation Accuracy for scRNA-seq Data. bioRxiv.
https://doi.org/10.1101/2025.04.10.647852