Embeddings
LM Deluge includes a standalone embeddings module for generating text embeddings in parallel from OpenAI and Cohere. It handles batching, retries, concurrency, and tracks token usage and cost as it runs.
Supported Models
Section titled “Supported Models”| Model | Provider | Dimensions | $/1M tokens |
|---|---|---|---|
text-embedding-3-small | OpenAI | 1536 | $0.02 |
text-embedding-3-large | OpenAI | 3072 | $0.13 |
text-embedding-ada-002 | OpenAI | 1536 | $0.10 |
embed-v4.0 | Cohere | 256 / 512 / 1024 / 1536 | $0.12 |
embed-english-v3.0 | Cohere | 1024 | $0.10 |
embed-english-light-v3.0 | Cohere | 384 | $0.10 |
embed-multilingual-v3.0 | Cohere | 1024 | $0.10 |
embed-multilingual-light-v3.0 | Cohere | 384 | $0.10 |
Quick Start
Section titled “Quick Start”import asynciofrom lm_deluge.embed import embed_parallel_async, stack_results
texts = [ "The cat sat on the mat.", "Machine learning is a subset of AI.", "Python is a popular programming language.",]
async def main(): results = await embed_parallel_async(texts, model="text-embedding-3-small") embeddings = stack_results(results) # list of list[float] print(f"Got {len(embeddings)} embeddings of dim {len(embeddings[0])}")
asyncio.run(main())There’s also a synchronous wrapper if you’re not in an async context:
from lm_deluge.embed import embed_sync
embeddings = embed_sync(texts, model="text-embedding-3-small")Cost Tracking
Section titled “Cost Tracking”The progress bar shows running cost and token count as batches complete:
Embedding [text-embedding-3-small]: 75%|███████▌ | 3/4 [00:00, $0.000002 | 120 tok] Embedded 20 texts in 4 batches | 160 tokens | $0.000003Each EmbeddingResponse also includes a tokens_used field:
results = await embed_parallel_async(texts, model="text-embedding-3-small")total_tokens = sum(r.tokens_used for r in results)Cohere embed-v4.0
Section titled “Cohere embed-v4.0”Cohere’s latest model supports configurable output dimensions via the output_dimension parameter:
results = await embed_parallel_async( texts, model="embed-v4.0", output_dimension=256, # 256, 512, 1024, or 1536 (default))You can also set input_type for Cohere models (defaults to "search_document"):
# For embedding search queries (not documents)results = await embed_parallel_async( queries, model="embed-v4.0", input_type="search_query",)Valid input_type values: search_document, search_query, classification, clustering.
Configuration
Section titled “Configuration”results = await embed_parallel_async( texts, model="text-embedding-3-small", # any model from the registry batch_size=64, # texts per API call (max 96) max_concurrent_requests=64, # max parallel requests max_attempts=5, # retries per batch request_timeout=30, # seconds per request show_progress=True, # tqdm progress bar)Working with Results
Section titled “Working with Results”embed_parallel_async returns a list of EmbeddingResponse objects (one per batch). Use stack_results to flatten them into a single list of vectors:
from lm_deluge.embed import embed_parallel_async, stack_results
results = await embed_parallel_async(texts, model="text-embedding-3-small")
# Flatten to a plain list of vectorsembeddings = stack_results(results) # raises if any batch failed
# Or inspect individual batchesfor r in results: print(f"Batch {r.id}: {len(r.embeddings)} vectors, {r.tokens_used} tokens") if r.is_error: print(f" Error: {r.error_message}")Environment Variables
Section titled “Environment Variables”Set the appropriate API key for your provider:
- OpenAI:
OPENAI_API_KEY - Cohere:
COHERE_API_KEY