Type: | Package |
Title: | Cell Type Annotation Using Large Language Models |
Version: | 1.3.2 |
Description: | Automated cell type annotation for single-cell RNA sequencing data using consensus predictions from multiple large language models (LLMs). LLMs are artificial intelligence models trained on vast text corpora to understand and generate human-like text. This package integrates with 'Seurat' objects and provides uncertainty quantification for annotations. Supports various LLM providers including 'OpenAI', 'Anthropic', and 'Google'. The package leverages these models through their respective APIs (Application Programming Interfaces) https://platform.openai.com/docs, https://docs.anthropic.com/, and https://ai.google.dev/gemini-api/docs. For details see Yang et al. (2025) <doi:10.1101/2025.04.10.647852>. |
License: | MIT + file LICENSE |
BugReports: | https://github.com/cafferychen777/mLLMCelltype/issues |
URL: | https://cafferyang.com/mLLMCelltype/ |
Encoding: | UTF-8 |
Imports: | dplyr, httr (≥ 1.4.0), jsonlite (≥ 1.7.0), R6 (≥ 2.5.0), digest (≥ 0.6.25), magrittr, utils |
Suggests: | knitr, rmarkdown, Seurat |
RoxygenNote: | 7.3.2 |
Config/build/clean-inst-doc: | FALSE |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-08-28 09:18:39 UTC; apple |
Author: | Chen Yang [aut, cre, cph] |
Maintainer: | Chen Yang <cafferychen777@tamu.edu> |
Repository: | CRAN |
Date/Publication: | 2025-09-02 20:50:12 UTC |
mLLMCelltype: Cell Type Annotation Using Large Language Models
Description
Automated cell type annotation for single-cell RNA sequencing data using consensus predictions from multiple large language models (LLMs). LLMs are artificial intelligence models trained on vast text corpora to understand and generate human-like text. This package integrates with 'Seurat' objects and provides uncertainty quantification for annotations. Supports various LLM providers including 'OpenAI', 'Anthropic', and 'Google'. The package leverages these models through their respective APIs (Application Programming Interfaces) https://platform.openai.com/docs, https://docs.anthropic.com/, and https://ai.google.dev/gemini-api/docs. For details see Yang et al. (2025) doi:10.1101/2025.04.10.647852.
Author(s)
Maintainer: Chen Yang cafferychen777@tamu.edu [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/cafferychen777/mLLMCelltype/issues
Package startup message
Description
Package startup message
Usage
.onAttach(libname, pkgname)
Package load message
Description
Package load message
Usage
.onLoad(libname, pkgname)
Anthropic API Processor
Description
Anthropic API Processor
Anthropic API Processor
Details
Concrete implementation of BaseAPIProcessor for Anthropic models. Handles Anthropic-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> AnthropicProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize Anthropic processor
Usage
AnthropicProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for Anthropic API
Method get_default_api_url()
Get default Anthropic API URL
Usage
AnthropicProcessor$get_default_api_url()
Returns
Default Anthropic API endpoint URL
Method make_api_call()
Make API call to Anthropic
Usage
AnthropicProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from Anthropic API response
Usage
AnthropicProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
AnthropicProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Base API Processor Class
Description
Base API Processor Class
Base API Processor Class
Details
Abstract base class for API processors that provides common functionality including unified logging, error handling, input processing, and response validation. This eliminates code duplication across all provider-specific processors.
Public fields
provider_name
Name of the API provider
logger
Unified logger instance
base_url
Custom base URL for API endpoints
Methods
Public methods
Method new()
Initialize the base API processor
Usage
BaseAPIProcessor$new(provider_name, base_url = NULL)
Arguments
provider_name
Name of the API provider (e.g., "openai", "anthropic")
base_url
Optional custom base URL for API endpoints
Method process_request()
Main entry point for processing API requests
Usage
BaseAPIProcessor$process_request(prompt, model, api_key)
Arguments
prompt
Input prompt text
model
Model identifier
api_key
API key for authentication
Returns
Processed response as character vector
Method get_api_url()
Get the API URL to use for requests
Usage
BaseAPIProcessor$get_api_url()
Returns
API URL string
Method get_default_api_url()
Abstract method to be implemented by subclasses for getting default API URL
Usage
BaseAPIProcessor$get_default_api_url()
Returns
Default API URL string
Method make_api_call()
Abstract method to be implemented by subclasses for making the actual API call
Usage
BaseAPIProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
Raw API response
Method extract_response_content()
Abstract method to be implemented by subclasses for extracting content from response
Usage
BaseAPIProcessor$extract_response_content(response, model)
Arguments
response
Raw API response
model
Model identifier
Returns
Extracted text content Validate input parameters
Method clone()
The objects of this class are cloneable with this method.
Usage
BaseAPIProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Cache Manager Class
Description
Manages caching of consensus analysis results
Public fields
cache_dir
Directory to store cache files
cache_version
Current cache version
Methods
Public methods
Method new()
Initialize cache manager
Usage
CacheManager$new(cache_dir = NULL)
Arguments
cache_dir
Directory to store cache files (defaults to tempdir())
Method generate_key()
Generate cache key from input parameters (improved version)
Usage
CacheManager$generate_key(input, models, cluster_id)
Arguments
input
Input data
models
Models used
cluster_id
Cluster ID
Returns
Cache key string
Method save_to_cache()
Save results to cache
Usage
CacheManager$save_to_cache(key, data)
Arguments
key
Cache key
data
Data to cache
Method load_from_cache()
Load results from cache
Usage
CacheManager$load_from_cache(key)
Arguments
key
Cache key
Returns
Cached data if exists, NULL otherwise
Method has_cache()
Check if results exist in cache
Usage
CacheManager$has_cache(key)
Arguments
key
Cache key
Returns
TRUE if cached results exist
Method get_cache_stats()
Get cache statistics
Usage
CacheManager$get_cache_stats()
Returns
A list with cache statistics
Method clear_cache()
Clear all cache
Usage
CacheManager$clear_cache(confirm = FALSE)
Arguments
confirm
Boolean, if TRUE, will clear cache without confirmation
Method validate_cache()
Validate cache content
Usage
CacheManager$validate_cache(key)
Arguments
key
Cache key
Returns
TRUE if cache is valid, FALSE otherwise Extract genes from input in a standardized way
Method clone()
The objects of this class are cloneable with this method.
Usage
CacheManager$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
DeepSeek API Processor
Description
DeepSeek API Processor
DeepSeek API Processor
Details
Concrete implementation of BaseAPIProcessor for DeepSeek models. Handles DeepSeek-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> DeepSeekProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize DeepSeek processor
Usage
DeepSeekProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for DeepSeek API
Method get_default_api_url()
Get default DeepSeek API URL
Usage
DeepSeekProcessor$get_default_api_url()
Returns
Default DeepSeek API endpoint URL
Method make_api_call()
Make API call to DeepSeek
Usage
DeepSeekProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from DeepSeek API response
Usage
DeepSeekProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
DeepSeekProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Gemini API Processor
Description
Gemini API Processor
Gemini API Processor
Details
Concrete implementation of BaseAPIProcessor for Gemini models. Handles Gemini-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> GeminiProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize Gemini processor
Usage
GeminiProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for Gemini API
Method get_default_api_url()
Get default Gemini API URL template
Usage
GeminiProcessor$get_default_api_url()
Returns
Default Gemini API endpoint URL template
Method get_api_url_for_model()
Get API URL for specific model
Usage
GeminiProcessor$get_api_url_for_model(model)
Arguments
model
Model identifier
Returns
Complete API URL for the model
Method make_api_call()
Make API call to Gemini
Usage
GeminiProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from Gemini API response
Usage
GeminiProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
GeminiProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Grok API Processor
Description
Grok API Processor
Grok API Processor
Details
Concrete implementation of BaseAPIProcessor for Grok models. Handles Grok-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> GrokProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize Grok processor
Usage
GrokProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for Grok API
Method get_default_api_url()
Get default Grok API URL
Usage
GrokProcessor$get_default_api_url()
Returns
Default Grok API endpoint URL
Method make_api_call()
Make API call to Grok
Usage
GrokProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from Grok API response
Usage
GrokProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
GrokProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Minimax API Processor
Description
Minimax API Processor
Minimax API Processor
Details
Concrete implementation of BaseAPIProcessor for Minimax models. Handles Minimax-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> MinimaxProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize Minimax processor
Usage
MinimaxProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for Minimax API
Method get_default_api_url()
Get default Minimax API URL
Usage
MinimaxProcessor$get_default_api_url()
Returns
Default Minimax API endpoint URL
Method make_api_call()
Make API call to Minimax
Usage
MinimaxProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from Minimax API response
Usage
MinimaxProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
MinimaxProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
OpenAI API Processor
Description
OpenAI API Processor
OpenAI API Processor
Details
Concrete implementation of BaseAPIProcessor for OpenAI models. Handles OpenAI-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> OpenAIProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize OpenAI processor
Usage
OpenAIProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for OpenAI API
Method get_default_api_url()
Get default OpenAI API URL
Usage
OpenAIProcessor$get_default_api_url()
Returns
Default OpenAI API endpoint URL
Method make_api_call()
Make API call to OpenAI
Usage
OpenAIProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from OpenAI API response
Usage
OpenAIProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
OpenAIProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
OpenRouter API Processor
Description
OpenRouter API Processor
OpenRouter API Processor
Details
Concrete implementation of BaseAPIProcessor for OpenRouter models. Handles OpenRouter-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> OpenRouterProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize OpenRouter processor
Usage
OpenRouterProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for OpenRouter API
Method get_default_api_url()
Get default OpenRouter API URL
Usage
OpenRouterProcessor$get_default_api_url()
Returns
Default OpenRouter API endpoint URL
Method make_api_call()
Make API call to OpenRouter
Usage
OpenRouterProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from OpenRouter API response
Usage
OpenRouterProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
OpenRouterProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Qwen API Processor
Description
Qwen API Processor
Qwen API Processor
Details
Concrete implementation of BaseAPIProcessor for Qwen models. Handles Qwen-specific API calls, authentication, and response parsing.
Qwen has two API endpoints:
International: https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation (preferred)
Domestic (China): https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation (fallback) The processor automatically tries international first, then falls back to domestic if needed.
Super class
mLLMCelltype::BaseAPIProcessor
-> QwenProcessor
Methods
Public methods
Inherited methods
Method new()
Test if an endpoint is accessible
Initialize Qwen processor
Usage
QwenProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for Qwen API
url
The endpoint URL to test
api_key
API key for authentication
Returns
TRUE if accessible, FALSE otherwise
Method get_default_api_url()
Get default Qwen API URL with intelligent endpoint selection
Usage
QwenProcessor$get_default_api_url()
Returns
Default Qwen API endpoint URL
Method get_working_api_url()
Get working Qwen API URL with automatic endpoint detection
Usage
QwenProcessor$get_working_api_url(api_key)
Arguments
api_key
API key for testing endpoints
Returns
Working Qwen API endpoint URL
Method make_api_call()
Make API call to Qwen
Usage
QwenProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from Qwen API response
Usage
QwenProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
QwenProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
StepFun API Processor
Description
StepFun API Processor
StepFun API Processor
Details
Concrete implementation of BaseAPIProcessor for StepFun models. Handles StepFun-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> StepFunProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize StepFun processor
Usage
StepFunProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for StepFun API
Method get_default_api_url()
Get default StepFun API URL
Usage
StepFunProcessor$get_default_api_url()
Returns
Default StepFun API endpoint URL
Method make_api_call()
Make API call to StepFun
Usage
StepFunProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from StepFun API response
Usage
StepFunProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
StepFunProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Unified Logger for mLLMCelltype Package
Description
Unified Logger for mLLMCelltype Package
Unified Logger for mLLMCelltype Package
Details
This logger provides centralized, multi-level logging with structured output, log rotation, and performance monitoring capabilities.
Public fields
log_dir
Directory for storing log files
log_level
Current logging level
session_id
Unique identifier for the current session
max_log_size
Maximum log file size in MB (default: 10MB)
max_log_files
Maximum number of log files to keep (default: 5)
enable_console
Whether to output to console (default: TRUE)
enable_json
Whether to use JSON format (default: TRUE)
performance_stats
Performance monitoring statistics
Methods
Public methods
Method new()
Initialize the unified logger
Usage
UnifiedLogger$new( base_dir = NULL, level = "INFO", max_size = 10, max_files = 5, console_output = TRUE, json_format = TRUE )
Arguments
base_dir
Base directory for logs (defaults to tempdir())
level
Logging level: DEBUG, INFO, WARN, ERROR (default: "INFO")
max_size
Maximum log file size in MB (default: 10)
max_files
Maximum number of log files to keep (default: 5)
console_output
Whether to output to console (default: TRUE)
json_format
Whether to use JSON format (default: TRUE)
Method debug()
Log a debug message
Usage
UnifiedLogger$debug(message, context = NULL)
Arguments
message
Log message
context
Additional context (optional)
Method info()
Log an info message
Usage
UnifiedLogger$info(message, context = NULL)
Arguments
message
Log message
context
Additional context (optional)
Method warn()
Log a warning message
Usage
UnifiedLogger$warn(message, context = NULL)
Arguments
message
Log message
context
Additional context (optional)
Method error()
Log an error message
Usage
UnifiedLogger$error(message, context = NULL)
Arguments
message
Log message
context
Additional context (optional)
Method log_api_call()
Log API call performance
Usage
UnifiedLogger$log_api_call( provider, model, duration, success = TRUE, tokens = NULL )
Arguments
provider
API provider name
model
Model name
duration
Duration in seconds
success
Whether the call was successful
tokens
Number of tokens used (optional)
Method log_api_request_response()
Log complete API request and response for debugging and audit
Usage
UnifiedLogger$log_api_request_response( provider, model, prompt_content, response_content, request_metadata = NULL, response_metadata = NULL )
Arguments
provider
API provider name
model
Model name
prompt_content
The complete prompt sent to the API
response_content
The complete response received from the API
request_metadata
Additional request metadata (optional)
response_metadata
Additional response metadata (optional)
Method log_cache_operation()
Log cache operations
Usage
UnifiedLogger$log_cache_operation(operation, key, size = NULL)
Arguments
operation
Operation type: "hit", "miss", "store", "clear"
key
Cache key
size
Size of cached data (optional)
Method log_cluster_progress()
Log cluster annotation progress
Usage
UnifiedLogger$log_cluster_progress(cluster_id, stage, progress = NULL)
Arguments
cluster_id
Cluster identifier
stage
Current stage
progress
Progress information
Method log_discussion()
Log detailed cluster discussion with complete model conversations
Usage
UnifiedLogger$log_discussion(cluster_id, event_type, data = NULL)
Arguments
cluster_id
Cluster identifier
event_type
Type of event (start, prediction, consensus, end)
data
Event data
Method get_performance_summary()
Get performance summary
Usage
UnifiedLogger$get_performance_summary()
Returns
List of performance statistics
Method cleanup_logs()
Clean up old log files
Usage
UnifiedLogger$cleanup_logs(force = FALSE)
Arguments
force
Force cleanup even if within file limits
Method set_level()
Set logging level
Usage
UnifiedLogger$set_level(level)
Arguments
level
New logging level: DEBUG, INFO, WARN, ERROR
Method clone()
The objects of this class are cloneable with this method.
Usage
UnifiedLogger$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Zhipu API Processor
Description
Zhipu API Processor
Zhipu API Processor
Details
Concrete implementation of BaseAPIProcessor for Zhipu models. Handles Zhipu-specific API calls, authentication, and response parsing.
Super class
mLLMCelltype::BaseAPIProcessor
-> ZhipuProcessor
Methods
Public methods
Inherited methods
Method new()
Initialize Zhipu processor
Usage
ZhipuProcessor$new(base_url = NULL)
Arguments
base_url
Optional custom base URL for Zhipu API
Method get_default_api_url()
Get default Zhipu API URL
Usage
ZhipuProcessor$get_default_api_url()
Returns
Default Zhipu API endpoint URL
Method make_api_call()
Make API call to Zhipu
Usage
ZhipuProcessor$make_api_call(chunk_content, model, api_key)
Arguments
chunk_content
Content for this chunk
model
Model identifier
api_key
API key
Returns
httr response object
Method extract_response_content()
Extract response content from Zhipu API response
Usage
ZhipuProcessor$extract_response_content(response, model)
Arguments
response
httr response object
model
Model identifier
Returns
Extracted text content
Method clone()
The objects of this class are cloneable with this method.
Usage
ZhipuProcessor$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Cell Type Annotation with Multi-LLM Framework
Description
A comprehensive function for automated cell type annotation using multiple Large Language Models (LLMs). This function supports both Seurat's differential gene expression results and custom gene lists as input. It implements a sophisticated annotation pipeline that leverages state-of-the-art LLMs to identify cell types based on marker gene expression patterns.
Usage
annotate_cell_types(
input,
tissue_name = NULL,
model = "gpt-4o",
api_key = NA,
top_gene_count = 10,
debug = FALSE,
base_urls = NULL
)
Arguments
input |
One of the following:
IMPORTANT NOTE ON CLUSTER IDs: The 'cluster' column must contain numeric values or values that can be converted to numeric. Non-numeric cluster IDs (e.g., "cluster_1", "T_cells", "7_0") may cause errors or unexpected behavior. Before using this function, it is recommended to:
|
tissue_name |
Character string specifying the tissue type or cell source (e.g., 'human PBMC', 'mouse brain'). This helps provide context for more accurate annotations. |
model |
Character string specifying the LLM model to use. Supported models:
|
api_key |
Character string containing the API key for the selected model. Each provider requires a specific API key format and authentication method:
The API key can be provided directly or stored in environment variables: # Direct API key result <- annotate_cell_types(input, tissue_name, model="gpt-4o", api_key="sk-...") # Using environment variables Sys.setenv(OPENAI_API_KEY="sk-...") Sys.setenv(ANTHROPIC_API_KEY="sk-ant-...") Sys.setenv(OPENROUTER_API_KEY="sk-or-...") # Then use with environment variables result <- annotate_cell_types(input, tissue_name, model="claude-3-opus", api_key=Sys.getenv("ANTHROPIC_API_KEY")) If NA, returns the generated prompt without making an API call, which is useful for reviewing the prompt before sending it to the API. |
top_gene_count |
Integer specifying the number of top marker genes to use per cluster. when input is from Seurat's FindAllMarkers(). Default: 10 |
debug |
Logical. If TRUE, prints additional debugging information during execution. |
base_urls |
Optional custom base URLs for API endpoints. Can be:
|
Value
A character vector containing:
When api_key is provided: One cell type annotation per cluster, in the order of input clusters
When api_key is NA: The generated prompt string that would be sent to the LLM
See Also
Examples
# Example 1: Using custom gene lists, returning prompt only (no API call)
annotate_cell_types(
input = list(
t_cells = list(genes = c('CD3D', 'CD3E', 'CD3G', 'CD28')),
b_cells = list(genes = c('CD19', 'CD79A', 'CD79B', 'MS4A1')),
monocytes = list(genes = c('CD14', 'CD68', 'CSF1R', 'FCGR3A'))
),
tissue_name = 'human PBMC',
model = 'gpt-4o',
api_key = NA # Returns prompt only without making API call
)
# Example 2: Using with Seurat pipeline and OpenAI model
## Not run:
library(Seurat)
# Load example data
data("pbmc_small")
# Find marker genes
all.markers <- FindAllMarkers(
object = pbmc_small,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25
)
# Set API key in environment variable (recommended approach)
Sys.setenv(OPENAI_API_KEY = "your-openai-api-key")
# Get cell type annotations using OpenAI model
openai_annotations <- annotate_cell_types(
input = all.markers,
tissue_name = 'human PBMC',
model = 'gpt-4o',
api_key = Sys.getenv("OPENAI_API_KEY"),
top_gene_count = 15
)
# Example 3: Using Anthropic Claude model
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")
claude_annotations <- annotate_cell_types(
input = all.markers,
tissue_name = 'human PBMC',
model = 'claude-3-opus',
api_key = Sys.getenv("ANTHROPIC_API_KEY"),
top_gene_count = 15
)
# Example 4: Using OpenRouter to access multiple models
Sys.setenv(OPENROUTER_API_KEY = "your-openrouter-api-key")
# Access OpenAI models through OpenRouter
openrouter_gpt4_annotations <- annotate_cell_types(
input = all.markers,
tissue_name = 'human PBMC',
model = 'openai/gpt-4o', # Note the provider/model format
api_key = Sys.getenv("OPENROUTER_API_KEY"),
top_gene_count = 15
)
# Access Anthropic models through OpenRouter
openrouter_claude_annotations <- annotate_cell_types(
input = all.markers,
tissue_name = 'human PBMC',
model = 'anthropic/claude-3-opus', # Note the provider/model format
api_key = Sys.getenv("OPENROUTER_API_KEY"),
top_gene_count = 15
)
# Example 5: Using with mouse brain data
mouse_annotations <- annotate_cell_types(
input = mouse_markers, # Your mouse marker genes
tissue_name = 'mouse brain', # Specify correct tissue for context
model = 'gpt-4o',
api_key = Sys.getenv("OPENAI_API_KEY"),
top_gene_count = 20, # Use more genes for complex tissues
debug = TRUE # Enable debug output
)
## End(Not run)
Calculate simple consensus without LLM
Description
Calculate simple consensus without LLM
Usage
calculate_simple_consensus(round_responses)
Arguments
round_responses |
Vector of model responses |
Value
List with consensus_proportion, entropy, and majority_prediction
Check if consensus is reached among models
Description
Check if consensus is reached among models
Usage
check_consensus(
round_responses,
api_keys = NULL,
controversy_threshold = 2/3,
entropy_threshold = 1,
consensus_check_model = NULL
)
Arguments
round_responses |
A vector of model responses to check for consensus |
api_keys |
A list of API keys for different providers |
controversy_threshold |
Threshold for consensus proportion (default: 2/3) |
entropy_threshold |
Threshold for entropy (default: 1.0) |
consensus_check_model |
Model to use for consensus checking (default: NULL, will try available models in order) |
Note
This function uses create_consensus_check_prompt from prompt_templates.R
Clean annotation text by removing prefixes and extra whitespace
Description
Clean annotation text by removing prefixes and extra whitespace
Usage
clean_annotation(annotation)
Arguments
annotation |
The annotation text to clean |
Value
Cleaned annotation text
Combine results from all phases of consensus annotation
Description
Combine results from all phases of consensus annotation
Usage
combine_results(initial_results, controversy_results, discussion_results)
Arguments
initial_results |
Results from initial prediction phase |
controversy_results |
Results from controversy identification phase |
discussion_results |
Results from discussion phase |
Value
Combined results
Compare predictions from different models
Description
This function runs the same input through multiple models and compares their predictions. It provides both individual predictions and a consensus analysis.
Usage
compare_model_predictions(
input,
tissue_name,
models = c("claude-sonnet-4-20250514", "claude-3-5-sonnet-20241022", "gpt-4.1-mini",
"deepseek-r1", "gemini-2.5-flash", "qwen-max-2025-01-25", "gpt-4o", "o1",
"grok-3-latest"),
api_keys,
top_gene_count = 10,
consensus_threshold = 0.5
)
Arguments
input |
Either the differential gene table returned by Seurat FindAllMarkers() function, or a list of genes. |
tissue_name |
Required. The tissue type or cell source (e.g., 'human PBMC', 'mouse brain', etc.). |
models |
Vector of model names to compare. Default includes one model from each provider. Supported models:
|
api_keys |
Named list of API keys. Can be provided in two formats:
The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example: api_keys <- list( "openai" = Sys.getenv("OPENAI_API_KEY"), "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"), "openrouter" = Sys.getenv("OPENROUTER_API_KEY"), "claude-3-opus" = "sk-ant-api03-specific-key-for-opus" ) |
top_gene_count |
Number of top differential genes to be used if input is Seurat differential genes. |
consensus_threshold |
Minimum proportion of models that must agree for a consensus (default 0.5). |
Value
A list containing individual predictions, consensus results, and agreement statistics.
Note
This function uses create_standardization_prompt from prompt_templates.R
Examples
## Not run:
# Compare predictions using different models
api_keys <- list(
"claude-sonnet-4-20250514" = "your-anthropic-key",
"deepseek-reasoner" = "your-deepseek-key",
"gemini-1.5-pro" = "your-gemini-key",
"qwen-max-2025-01-25" = "your-qwen-key"
)
results <- compare_model_predictions(
input = list(gs1=c('CD4','CD3D'), gs2='CD14'),
tissue_name = 'PBMC',
api_keys = api_keys
)
## End(Not run)
Set global logger configuration
Description
Set global logger configuration
Usage
configure_logger(level = "INFO", console_output = TRUE, json_format = TRUE)
Arguments
level |
Logging level |
console_output |
Whether to output to console |
json_format |
Whether to use JSON format |
Prompt templates for mLLMCelltype
Description
This file contains all prompt template functions used in mLLMCelltype. These functions create various prompts for different stages of the cell type annotation process. Create prompt for cell type annotation
Usage
create_annotation_prompt(input, tissue_name, top_gene_count = 10)
Arguments
input |
Either the differential gene table returned by Seurat FindAllMarkers() function, or a list of genes |
tissue_name |
The name of the tissue |
top_gene_count |
Number of top differential genes to use per cluster |
Value
A list containing the prompt string and expected count of responses
Create prompt for checking consensus among model predictions
Description
Create prompt for checking consensus among model predictions
Usage
create_consensus_check_prompt(
round_responses,
controversy_threshold = 2/3,
entropy_threshold = 1
)
Arguments
round_responses |
A vector of cell type predictions from different models |
controversy_threshold |
Threshold for consensus proportion (default: 2/3) |
entropy_threshold |
Threshold for entropy (default: 1.0) |
Value
A formatted prompt string for consensus checking
Create prompt for additional discussion rounds
Description
Create prompt for additional discussion rounds
Usage
create_discussion_prompt(
cluster_id,
cluster_genes,
tissue_name,
previous_rounds,
round_number
)
Arguments
cluster_id |
The ID of the cluster being analyzed |
cluster_genes |
The marker genes for the cluster |
tissue_name |
The name of the tissue (optional) |
previous_rounds |
A list of previous discussion rounds |
round_number |
The current round number |
Value
A formatted prompt string for additional discussion rounds
Create prompt for the initial round of discussion
Description
Create prompt for the initial round of discussion
Usage
create_initial_discussion_prompt(
cluster_id,
cluster_genes,
tissue_name,
initial_predictions
)
Arguments
cluster_id |
The ID of the cluster being analyzed |
cluster_genes |
The marker genes for the cluster |
tissue_name |
The name of the tissue (optional) |
initial_predictions |
A list of initial model predictions |
Value
A formatted prompt string for the initial discussion round
Create prompt for standardizing cell type names
Description
Create prompt for standardizing cell type names
Usage
create_standardization_prompt(all_cell_types)
Arguments
all_cell_types |
A vector of cell type names to standardize |
Value
A formatted prompt string for cell type standardization
Custom model manager for mLLMCelltype
Description
This module provides functionality to register and manage custom LLM providers and models. It allows users to integrate their own LLM services with the mLLMCelltype framework.
Usage
custom_providers
Format
An object of class environment
of length 0.
Execute consensus check with retry logic
Description
Execute consensus check with retry logic
Usage
execute_consensus_check(formatted_responses, api_keys, models_to_try)
Arguments
formatted_responses |
Formatted prompt for consensus check |
api_keys |
List of API keys |
models_to_try |
Character vector of models to attempt |
Value
List with success flag and response
Extract numeric value from line containing a label
Description
Extract numeric value from line containing a label
Usage
extract_labeled_value(lines, pattern, value_pattern)
Arguments
lines |
Character vector of all response lines |
pattern |
Pattern to match the label |
value_pattern |
Pattern to extract the numeric value |
Value
Numeric value or NULL if not found
Facilitate discussion for a controversial cluster
Description
Facilitate discussion for a controversial cluster
Usage
facilitate_cluster_discussion(
cluster_id,
input,
tissue_name,
models,
api_keys,
initial_predictions,
top_gene_count,
max_rounds = 3,
controversy_threshold = 0.7,
entropy_threshold = 1,
consensus_check_model = NULL
)
Note
This function uses create_initial_discussion_prompt and create_discussion_prompt from prompt_templates.R
Find majority prediction from response lines
Description
Find majority prediction from response lines
Usage
find_majority_prediction(lines)
Arguments
lines |
Character vector of response lines |
Value
Character string of majority prediction
Utility functions for API key management
Description
This file contains utility functions for managing API keys and related operations. Get API key for a specific model
Usage
get_api_key(model, api_keys)
Arguments
model |
The name of the model to get the API key for |
api_keys |
Named list of API keys |
Details
This function retrieves the appropriate API key for a given model by first checking the provider name and then the model name in the provided API keys list.
Value
The API key if found, NULL otherwise
Get initial predictions from all models
Description
This function retrieves initial cell type predictions from all specified models. It is an internal helper function used by the interactive_consensus_annotation function.
Usage
get_initial_predictions(
input,
tissue_name,
models,
api_keys,
top_gene_count,
base_urls = NULL
)
Arguments
input |
Either the differential gene table or a list of genes |
tissue_name |
The tissue type or cell source |
models |
Vector of model names to use |
api_keys |
Named list of API keys |
top_gene_count |
Number of top differential genes to use |
base_urls |
Optional custom base URLs for API endpoints |
Value
A list containing individual predictions and successful models
Get the global logger instance
Description
Get the global logger instance
Usage
get_logger()
Value
UnifiedLogger instance
Get response from a specific model
Description
Get response from a specific model
Usage
get_model_response(prompt, model, api_key)
Determine provider from model name
Description
This function determines the appropriate provider (e.g., OpenAI, Anthropic, Google, OpenRouter) based on the model name.
This is a helper function that extracts the provider name from a model identifier. It's used internally to determine which base_url to use from a list of provider-specific URLs.
Usage
get_provider(model)
get_provider(model)
Arguments
model |
Model identifier |
Details
Supported providers and models include:
OpenAI: 'chatgpt-4o-latest', 'gpt-3.5-turbo', 'gpt-4', 'gpt-4-turbo', 'gpt-4.1', 'gpt-4.1-mini', 'gpt-4.1-nano', 'gpt-4o', 'gpt-4o-mini', 'gpt-5', 'gpt-5-mini', 'gpt-5-nano', 'o1', 'o1-mini', 'o1-pro', 'o3', 'o3-mini', 'o4-mini' and more with date variants
Anthropic: 'claude-opus-4-1-20250805', 'claude-opus-4-20250514', 'claude-sonnet-4-20250514', 'claude-3-7-sonnet-20250219', 'claude-3-5-sonnet-20241022', 'claude-3-5-haiku-20241022', 'claude-3-opus-20240229'
DeepSeek: 'deepseek-chat', 'deepseek-reasoner'
Google: 'gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash', 'gemini-2.0-flash-lite', 'gemini-1.5-pro-latest', 'gemini-1.5-flash-latest', 'gemini-1.5-flash-8b'
Qwen: 'qwen-max-2025-01-25', 'qwen3-72b'
Stepfun: 'step-2-mini', 'step-2-16k', 'step-1-8k'
Zhipu: 'glm-4-plus', 'glm-3-turbo'
MiniMax: 'minimax-text-01'
Grok: 'grok-3', 'grok-3-latest', 'grok-3-fast', 'grok-3-fast-latest', 'grok-3-mini', 'grok-3-mini-latest', 'grok-3-mini-fast', 'grok-3-mini-fast-latest'
OpenRouter: Provides access to models from multiple providers through a single API. Format: 'provider/model-name'
OpenAI models: 'openai/gpt-4o', 'openai/gpt-4o-mini', 'openai/gpt-4-turbo', 'openai/gpt-4', 'openai/gpt-3.5-turbo'
Anthropic models: 'anthropic/claude-opus-4.1', 'anthropic/claude-opus-4', 'anthropic/claude-sonnet-4', 'anthropic/claude-3.7-sonnet', 'anthropic/claude-3.5-sonnet', 'anthropic/claude-3.5-haiku', 'anthropic/claude-3-opus'
Meta models: 'meta-llama/llama-3-70b-instruct', 'meta-llama/llama-3-8b-instruct', 'meta-llama/llama-2-70b-chat'
Google models: 'google/gemini-2.5-pro', 'google/gemini-2.5-flash', 'google/gemini-2.0-flash', 'google/gemini-1.5-pro-latest', 'google/gemini-1.5-flash'
Mistral models: 'mistralai/mistral-large', 'mistralai/mistral-medium', 'mistralai/mistral-small'
Qwen models: 'qwen/qwen3-coder:free', 'qwen/qwen3-235b-a22b-07-25:free', 'qwen/qwen2.5-72b-instruct:free'
DeepSeek models: 'deepseek/deepseek-r1:free', 'tngtech/deepseek-r1t2-chimera:free'
Other models: 'microsoft/mai-ds-r1:free', 'moonshotai/kimi-k2:free', 'tencent/hunyuan-a13b-instruct:free'
Value
Character string with the provider name
Provider name
Identify controversial clusters based on consensus analysis
Description
Identify controversial clusters based on consensus analysis
Usage
identify_controversial_clusters(
input,
individual_predictions,
controversy_threshold,
entropy_threshold,
api_keys,
consensus_check_model = NULL
)
Arguments
input |
Either the differential gene table or a list of genes |
individual_predictions |
List of predictions from each model |
controversy_threshold |
Threshold for marking clusters as controversial |
entropy_threshold |
Entropy threshold for identifying controversial clusters |
Value
A list containing controversial clusters and consensus results
Interactive consensus building for cell type annotation
Description
This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:
Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)
This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:
Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)
Usage
interactive_consensus_annotation(
input,
tissue_name = NULL,
models = c("claude-sonnet-4-20250514", "claude-3-7-sonnet-20250219",
"claude-3-5-sonnet-20241022", "claude-3-5-haiku-20241022", "gemini-2.0-flash",
"gemini-1.5-pro", "qwen-max-2025-01-25", "gpt-4o", "grok-3-latest"),
api_keys,
top_gene_count = 10,
controversy_threshold = 0.7,
entropy_threshold = 1,
max_discussion_rounds = 3,
consensus_check_model = NULL,
log_dir = NULL,
cache_dir = NULL,
use_cache = TRUE,
base_urls = NULL,
clusters_to_analyze = NULL,
force_rerun = FALSE
)
Arguments
input |
One of the following:
|
tissue_name |
Optional input of tissue name |
models |
Vector of model names to participate in the discussion. Supported models:
|
api_keys |
Named list of API keys. Can be provided in two formats:
The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example: api_keys <- list( "openai" = Sys.getenv("OPENAI_API_KEY"), "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"), "openrouter" = Sys.getenv("OPENROUTER_API_KEY"), "claude-3-opus" = "sk-ant-api03-specific-key-for-opus" ) |
top_gene_count |
Number of top differential genes to use |
controversy_threshold |
Consensus proportion threshold (default: 0.7). Clusters with consensus proportion below this value will be marked as controversial |
entropy_threshold |
Entropy threshold for identifying controversial clusters (default: 1.0) |
max_discussion_rounds |
Maximum number of discussion rounds for controversial clusters (default: 3) |
consensus_check_model |
Model to use for consensus checking |
log_dir |
Directory for storing logs (defaults to tempdir()) |
cache_dir |
Directory for storing cache (defaults to tempdir()) |
use_cache |
Whether to use cached results |
base_urls |
Optional custom base URLs for API endpoints. Can be:
|
clusters_to_analyze |
Optional vector of cluster IDs to analyze. If NULL (default), all clusters in the input will be analyzed. Must be character or numeric values that match the cluster IDs in your input. Examples:
|
force_rerun |
Logical. If TRUE, ignore cached results and force re-analysis of all specified clusters. Useful when you want to re-analyze clusters with different context or for subtype identification. Default is FALSE. Note: This parameter only affects the discussion phase for controversial clusters. |
Value
A list containing consensus results, logs, and annotations
A list containing consensus results, logs, and annotations
Get list of registered custom models
Description
Get list of registered custom models
Usage
list_custom_models()
Value
Character vector of model names
Get list of registered custom providers
Description
Get list of registered custom providers
Usage
list_custom_providers()
Value
Character vector of provider names
Convenience functions for logging
Description
Convenience functions for logging
Usage
log_debug(message, context = NULL)
log_info(message, context = NULL)
log_warn(message, context = NULL)
log_error(message, context = NULL)
Arguments
message |
Log message |
context |
Additional context (optional) |
Normalize annotation for comparison
Description
Normalize annotation for comparison
Usage
normalize_annotation(annotation)
Arguments
annotation |
The annotation string to normalize |
Value
Normalized annotation string
Parse consensus response from model
Description
Parse consensus response from model
Usage
parse_consensus_response(response)
Arguments
response |
Character string response from model |
Value
List with consensus results
Parse flexible format consensus response
Description
Parse flexible format consensus response
Usage
parse_flexible_format(lines)
Arguments
lines |
Character vector of all response lines |
Value
List with parsed values
Parse standard 4-line consensus response format
Description
Parse standard 4-line consensus response format
Usage
parse_standard_format(result_lines)
Arguments
result_lines |
Character vector of 4 lines |
Value
List with parsed values or NULL if not standard format
Prepare list of models to try for consensus checking
Description
Prepare list of models to try for consensus checking
Usage
prepare_models_list(consensus_check_model = NULL)
Arguments
consensus_check_model |
User-specified model (can be NULL) |
Value
Character vector of models in order of preference
Print summary of consensus results
Description
This function prints a detailed summary of the consensus building process, including initial predictions from all models, uncertainty metrics, and final consensus for each controversial cluster.
Usage
print_consensus_summary(results)
Arguments
results |
A list containing consensus annotation results with the following components:
|
Value
None, prints summary to console
Process request using Anthropic models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_anthropic(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
Anthropic API key |
base_url |
Optional custom base URL for Anthropic API |
Value
Processed response as character vector
Process controversial clusters through discussion
Description
Process controversial clusters through discussion
Usage
process_controversial_clusters(
controversial_clusters,
input,
tissue_name,
successful_models,
api_keys,
individual_predictions,
top_gene_count,
controversy_threshold,
entropy_threshold,
max_discussion_rounds,
cache_manager,
use_cache,
consensus_check_model = NULL,
force_rerun = FALSE
)
Arguments
controversial_clusters |
List of controversial cluster IDs |
input |
Either the differential gene table or a list of genes |
tissue_name |
The tissue type or cell source |
successful_models |
Vector of successful model names |
api_keys |
Named list of API keys |
individual_predictions |
List of predictions from each model |
top_gene_count |
Number of top differential genes to use |
controversy_threshold |
Threshold for marking clusters as controversial |
max_discussion_rounds |
Maximum number of discussion rounds for controversial clusters |
cache_manager |
Cache manager object |
use_cache |
Whether to use cached results |
consensus_check_model |
Model to use for consensus checking |
force_rerun |
Whether to force re-analysis, ignoring cache |
Value
A list containing discussion logs and final annotations
Process request using custom provider
Description
Process request using custom provider
Usage
process_custom(prompt, model, api_key)
Process request using DeepSeek models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_deepseek(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
DeepSeek API key |
base_url |
Optional custom base URL for DeepSeek API |
Value
Processed response as character vector
Process request using Gemini models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_gemini(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
Gemini API key |
base_url |
Optional custom base URL for Gemini API |
Value
Processed response as character vector
Process request using Grok models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_grok(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
Grok API key |
base_url |
Optional custom base URL for Grok API |
Value
Processed response as character vector
Process request using Minimax models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_minimax(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
Minimax API key |
base_url |
Optional custom base URL for Minimax API |
Value
Processed response as character vector
Process request using OpenAI models
Description
Main function that creates an OpenAI processor and handles the request. This maintains backward compatibility with the existing API.
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_openai(prompt, model, api_key, base_url = NULL)
process_openai(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
OpenAI API key |
base_url |
Optional custom base URL for OpenAI API |
Value
Processed response as character vector
Processed response as character vector
Process request using OpenRouter models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_openrouter(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
OpenRouter API key |
base_url |
Optional custom base URL for OpenRouter API |
Value
Processed response as character vector
Process request using Qwen models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_qwen(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
Qwen API key |
base_url |
Optional custom base URL for Qwen API |
Value
Processed response as character vector
Process request using StepFun models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_stepfun(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
StepFun API key |
base_url |
Optional custom base URL for StepFun API |
Value
Processed response as character vector
Process request using Zhipu models
Description
This function uses the new BaseAPIProcessor architecture for improved maintainability and consistent logging across all API providers.
Usage
process_zhipu(prompt, model, api_key, base_url = NULL)
Arguments
prompt |
Input prompt text |
model |
Model identifier |
api_key |
Zhipu API key |
base_url |
Optional custom base URL for Zhipu API |
Value
Processed response as character vector
Register a custom model for a provider
Description
Register a custom model for a provider
Usage
register_custom_model(model_name, provider_name, model_config = list())
Arguments
model_name |
Character string, unique identifier for the model |
provider_name |
Character string, name of the registered provider |
model_config |
List of model-specific configuration parameters |
Value
Invisibly returns TRUE if registration is successful
Examples
## Not run:
register_custom_model(
model_name = "my_model",
provider_name = "my_provider",
model_config = list(
temperature = 0.7,
max_tokens = 2000
)
)
## End(Not run)
Register a custom LLM provider
Description
Register a custom LLM provider
Usage
register_custom_provider(provider_name, process_fn, description = NULL)
Arguments
provider_name |
Character string, unique identifier for the provider |
process_fn |
Function that processes prompts and returns responses. Must accept parameters: prompt, model, api_key |
description |
Optional description of the provider |
Value
Invisibly returns TRUE if registration is successful
Examples
## Not run:
register_custom_provider(
provider_name = "my_provider",
process_fn = function(prompt, model, api_key) {
# Custom implementation
response <- httr::POST(
url = "your_api_endpoint",
body = list(prompt = prompt),
encode = "json"
)
return(httr::content(response)$choices[[1]]$text)
}
)
## End(Not run)
URL Utilities for Base URL Resolution
Description
This file contains utility functions for resolving and validating custom base URLs for different API providers. Resolve provider-specific base URL
Usage
resolve_provider_base_url(provider, base_urls)
Arguments
provider |
Provider name (e.g., "openai", "anthropic") |
base_urls |
User-provided base URLs (string or named list) |
Value
Resolved base URL or NULL
Sanitize base URL
Description
Sanitize base URL
Usage
sanitize_base_url(url)
Arguments
url |
URL to sanitize |
Value
Sanitized URL
Select the best prediction from consensus results
Description
Select the best prediction from consensus results
Usage
select_best_prediction(consensus_result, valid_predictions)
Arguments
consensus_result |
Consensus analysis result |
valid_predictions |
Valid predictions for the cluster |
Value
The best prediction
Standardize cell type names using a language model
Description
This function takes predictions from multiple models and standardizes the cell type nomenclature to ensure consistent naming across different models' outputs.
Usage
standardize_cell_type_names(
predictions,
models,
api_keys,
standardization_model = "claude-sonnet-4-20250514"
)
Arguments
predictions |
List of predictions from different models |
models |
Vector of model names that successfully completed predictions |
api_keys |
Named list of API keys. Can be provided in two formats:
|
standardization_model |
Model to use for standardization (default: "claude-sonnet-4-20250514") |
Value
List of standardized predictions with the same structure as the input
Summarize discussion and determine final cell type
Description
NOTE: This function is currently not in use. The consensus_annotation.R file now directly extracts the majority_prediction from the last round of discussion. This function is kept for potential future use or reference.
Usage
summarize_discussion(discussion_log, cluster_id, model, api_key)
Arguments
discussion_log |
Discussion log for a cluster |
cluster_id |
Cluster identifier |
model |
Model to use for summary |
api_key |
API key for the model |
Value
Final cell type determination
Validate base URL format
Description
Validate base URL format
Usage
validate_base_url(url)
Arguments
url |
URL to validate |
Value
TRUE if valid, FALSE otherwise