Changelog
LM Deluge iterates quickly. This page calls out the highlights from the most recent releases. For a blow-by-blow history you can always inspect git log, but the sections below should help you catch up quickly starting with v0.0.62.
0.0.138 · 2026-04-08
Section titled “0.0.138 · 2026-04-08”- Cloudflare Workers AI provider: New provider with 6 models — Kimi K2.5, GLM-4.7-Flash, GPT-OSS-120B, Llama 4 Scout, Gemma 4 26B, and Nemotron 3 120B. Uses Cloudflare’s OpenAI-compatible endpoint with a thin custom handler for account ID injection. Requires
CLOUDFLARE_API_TOKENandCLOUDFLARE_ACCOUNT_IDenv vars. Model IDs are suffixed-cf(e.g.kimi-k2.5-cf,llama-4-scout-cf). - Gemini batch processing: Added
submit_batches_geminifor Gemini’s native batch API with file upload, polling, and result retrieval. Integrates with the existingsubmit_batch_job/wait_for_batch_jobclient methods. - GPT-5.4 mini and nano: Added
gpt-5.4-mini($0.75/$4.50) andgpt-5.4-nano($0.20/$1.25) to the OpenAI registry with xhigh reasoning and verbosity support. - Arcee Trinity models: Added
trinity-large-thinkingandtrinity-large-previewreasoning models from Arcee AI.
0.0.136 · 2026-03-17
Section titled “0.0.136 · 2026-03-17”- GPT-5.4 mini and nano: Added
gpt-5.4-miniandgpt-5.4-nanoto the OpenAI registry with the same reasoning and capability handling asgpt-5.4, plus the new pricing tiers. Added a live one-off smoke test that exercises both models against the OpenAI API and verifies cost accounting.
0.0.135 · 2026-03-09
Section titled “0.0.135 · 2026-03-09”- Audio file type inference: When passing raw
bytestotranscribe_async/transcribe_sync, the library now usesfiletypeto detect the actual audio format (mp3, ogg, flac, etc.) instead of blindly assuming WAV. A newAudioSourcetype alias also accepts(bytes, filename)tuples so callers can preserve the original format and extension when the bytes come from an in-memory source.
0.0.133 · 2026-03-08
Section titled “0.0.133 · 2026-03-08”- Audio transcription: New
lm_deluge.transcribemodule — a lightweight, standalone transcription client (likeembed). Supports OpenAI (whisper-1,gpt-4o-transcribe,gpt-4o-mini-transcribe), Mistral (voxtral-mini-latest), Fireworks (whisper-v3,whisper-v3-turbo), and Deepgram (nova-3,nova-2). Parallel batch transcription with rate limiting, retries, cost tracking, and progress bars. Usetranscribe_async/transcribe_syncwith file paths,Pathobjects, or raw bytes. - Auto-splitting for long audio: When a file exceeds a model’s duration or file-size limit, the module automatically splits it into chunks via ffmpeg, transcribes the chunks in parallel, and stitches the results back together — timestamps, segments, and words are all adjusted to reflect the original file’s timeline. If ffmpeg isn’t installed and the file is within limits, it sends directly; if the file exceeds limits and ffmpeg is missing, a clear error message tells the user what to do.
TranscriptionResponse: Results include.text,.language,.duration,.segments(list ofTranscriptionSegmentwith start/end/speaker), and.words. Duration and language are filled in from ffprobe when the API doesn’t return them (e.g. gpt-4o-transcribe models).
0.0.131 · 2026-03-05
Section titled “0.0.131 · 2026-03-05”- GPT-5.4 models: Added
gpt-5.4($2.50/$15) andgpt-5.4-pro($30/$180) — OpenAI’s latest reasoning models with image support andxhighreasoning effort. - Verbosity parameter: New
verbosityparam onLLMClientandSamplingParamsas a unified alias for output effort control. Maps to OpenAI’s nativeverbosityAPI field (low/medium/high) on GPT-5+ models, and to Anthropic’soutput_config.efforton Claude 4.5+/4.6 models. Cross-provider values are normalized automatically (e.g."max"→"high"for OpenAI with a warning).verbosityandglobal_effortstay synced — setting one sets the other, and providing conflicting values raisesValueError. - Lazy effort defaults:
global_effortno longer defaults to"high"inSamplingParams. Instead, the Anthropic request builder applies the"high"default only for models that support GA effort, avoiding unnecessary params for other providers. - Warning cleanup: Anthropic’s
json_mode-without-schema warning now fires once viamaybe_warninstead of printing every time. New warning keys for unsupported verbosity/effort combinations. - Proxy server verbosity: The OpenAI-compatible proxy server now accepts and forwards the
verbosityfield in chat completion requests.
0.0.130 · 2026-03-03
Section titled “0.0.130 · 2026-03-03”- Session sharing: Batch requests (
process_batch,batch_agent_loop) now share a singleaiohttp.ClientSessionacross all tasks, reducing connection overhead and improving throughput for large batches. The connector pool is sized with headroom abovemax_concurrent_requestsso aiohttp never becomes the hidden bottleneck. - Improved rate-limit dispatch: Replaced
_wait_for_capacitywith_consume_rate_capacity— rate-limit waiting now happens before acquiring the concurrency semaphore, fixing head-of-line blocking for mixed-size requests. Wait times are computed dynamically from RPM/TPM deficit instead of fixed sleeps. Requests that can never fit the TPM budget raise immediately with a clear error instead of hanging forever. - Graceful exception handling in batches:
process_batchandbatch_agent_loopnow usereturn_exceptions=Trueinasyncio.gather, so a single failing task no longer cancels all siblings. Exceptions are converted to errorAPIResponseobjects, giving callers a uniform list back.
0.0.129 · 2026-03-03
Section titled “0.0.129 · 2026-03-03”- Gemini 3.1 Flash Lite: Added
gemini-3.1-flash-lite-preview(native Gemini API) andgemini-3.1-flash-lite-compat(OpenAI-compatible endpoint). Google’s most cost-efficient multimodal model — $0.25/M input, $1.50/M output. Supports thinking, structured outputs, and function calling (native endpoint only; the OpenAI-compat endpoint doesn’t support tool calling for Gemini 3.x due to thought signature requirements).
0.0.128 · 2026-03-03
Section titled “0.0.128 · 2026-03-03”- Dynamic-filtering web search tool: Added
web_search_tool_dynamic()for Anthropic’s newweb_search_20260209tool, which lets Claude write and execute code to filter search results before they enter the context window. Only supported on Claude Opus 4.6 and Sonnet 4.6 — raisesValueErrorfor unsupported models. The API auto-injects the code execution tool, so you don’t need to pass it separately. The existingweb_search_tool()(web_search_20250305) remains available for all models. user_locationfor web search: Bothweb_search_tool()andweb_search_tool_dynamic()now accept auser_locationdict to localize search results by city, region, country, and timezone.- Graceful
ClientOSErrorhandling:aiohttp.ClientOSError(connection reset, broken pipe, etc.) now returns a structured errorAPIResponseinstead of an unhandled exception traceback.
0.0.127 · 2026-03-02
Section titled “0.0.127 · 2026-03-02”- Bedrock API key authentication: Bedrock now supports the new AWS Bedrock API keys alongside the existing SigV4 (access key + secret) auth flow. Set
BEDROCK_API_KEY,AWS_BEDROCK_API_KEY, orAWS_BEARER_TOKEN_BEDROCKto use simple Bearer-token auth — norequests-aws4authdependency needed. When both an API key and IAM credentials are present, the API key takes precedence. Auth logic consolidated into a sharedbedrock_authmodule, removing duplication across the Anthropic, OpenAI, and Nova Bedrock request builders. - JustBashSandbox: New cross-platform sandbox powered by Vercel’s just-bash. Provides read-scoped, network-disabled, copy-on-write execution without Docker or platform-specific tooling. Supports configurable
root_dir,working_dir, read-only mode, optional Python viaenable_python, auto-install of thejust-bashCLI, background process tracking, and the samebash/list_processestool interface as other sandboxes. See Tool Use for usage.
0.0.126 · 2026-02-27
Section titled “0.0.126 · 2026-02-27”- Inception (Mercury 2) support: Added
mercury-2model from Inception Labs — a fast diffusion-based LLM with 128K context. Uses the OpenAI-compatible endpoint atapi.inceptionlabs.ai. Supports tool calling, structured outputs, andreasoning_effortwith Mercury-specific mapping ("none"/"minimal"→"instant"mode for near-zero-latency responses). RequiresINCEPTION_API_KEYenv var.
0.0.125 · 2026-02-27
Section titled “0.0.125 · 2026-02-27”- Bedrock Updates: Anthropic Bedrock provider deprecated older models that aren’t supported, added more US regions for US cross-region inference, and added support for Global cross-region inference via ‘-global’ models that work across many regions for Claude 4 onwards. This allows much higher TPM/RPM compared to previous setup (only using us-west). Global is opt-in by choosing ‘-global’-suffixed models. Make sure to enable all regions in your AWS account to avoid errors with disabled regions.
0.0.124 · 2026-02-26
Section titled “0.0.124 · 2026-02-26”- Automatic prompt caching: New
cache="automatic"pattern that sets the top-levelcache_controlflag on Anthropic requests, letting the provider decide what to cache instead of manually specifying cache breakpoints. Supported inLLMClient,run_agent_loop, and the proxy server (DELUGE_CACHE_PATTERN=automatic). Bedrock does not support this mode and will emit a warning and fall back to no caching ifautomaticis requested. - PybubbleSandbox: New Linux sandbox backed by pybubble (bubblewrap). Provides filesystem and process isolation without Docker — just needs
bwrapinstalled. Supports configurable network access (network_access,outbound_access,allow_host_loopback), optional fallback to host-network sharing in restricted runtimes (requiresallow_host_loopback=True), background process tracking, and the samebash/list_processestool interface as other sandboxes. Requires thepybubbledependency and Linux. See Tool Use for usage.
0.0.123 · 2026-02-25
Section titled “0.0.123 · 2026-02-25”- Agent loop final-turn warning: When
run_agent_loopreaches its last round (max_rounds), a user message is now injected telling the model it must return a text response and cannot call any more tools. This prevents agents from wasting the final turn on tool calls that will never be executed, makingSubAgentManagerand agent loops in general more reliable.
0.0.122 · 2026-02-24
Section titled “0.0.122 · 2026-02-24”- New
VectorDBManagerprefab tool: In-process vector database for agents, backed by numpy with brute-force cosine similarity search. Exposesinsert,search,get,delete,count, andlistcommands. Supports a pluggableVectorDBBackendABC so you can swap in heavier stores (USearch, turbopuffer, etc.) without changing tool wiring. Ships withInProcessVectorDBfor small-to-medium collections (~100k vectors). Available via the newvector_dboptional extra (pip install lm_deluge[vector_db]). - Retry-After header support for rate limiting: All providers (Anthropic, OpenAI, Gemini, Bedrock, Bedrock Nova, Mistral) now parse
retry-afterandretry-after-msresponse headers on 429s and use the server-suggested cooldown duration instead of a fixed pause. Capped atMAX_COOLDOWN_SECONDSfor safety. - Improved cooldown logging: Rate-limit pause messages now print once per cooldown event (not once per waiting task) and show the actual pause duration, reducing log noise during heavy parallel runs.
- Relaxed
markdownify-rsversion: Changed from pinned==0.1.1to>=0.1.1.
0.0.120 · 2026-02-20
Section titled “0.0.120 · 2026-02-20”- New
SqliteManagerprefab tool: Schema-first SQLite tool for agents — supportslist_tables,describe_table, andquerycommands with progressive disclosure. Build a DB from a list-of-dicts viaSqliteManager.from_dicts()(auto-infers column types including JSON columns) or point at an existing.dbfile. Supports read-only mode, parameterized queries, multiple output formats (JSON, YAML, CSV, TSV), row-count truncation, and sample rows. See Tool Use for usage. - Retired
claude-3.5-haiku: Removedclaude-3.5-haiku(claude-3-5-haiku-20241022) from the model registry — the model has been retired by Anthropic. Useclaude-4.5-haikuinstead. Remaining references to3.5-haikuin examples and tests have been updated to4.5-haiku.
0.0.119 · 2026-02-19
Section titled “0.0.119 · 2026-02-19”- Embedding rate limiting:
embed_parallel_asyncnow acceptsmax_requests_per_minuteandmax_tokens_per_minuteparameters, using the sameStatusTrackercapacity system asLLMClient. Rate-limit (429) responses trigger automatic cooldown. Previously these were accepted as**kwargsand leaked into the API payload, causing request failures. - Deterministic test suite: Added
tests/core/test_embed.pywith 15 tests covering request building, response parsing, payload isolation (control kwargs never leak to the API), rate-limit param acceptance, and edge cases — no live API calls required.
0.0.118 · 2026-02-19
Section titled “0.0.118 · 2026-02-19”- Embeddings rewrite: Completely rewrote
lm_deluge.embed— the old implementation was broken (crashed immediately due to a serialization bug). The new module usesasyncio.gather+Semaphorefor clean parallel batching with per-request sessions, exponential backoff retries, and live cost/token tracking in the progress bar. - Cohere v2 API: Switched Cohere embeddings from the deprecated v1 to the v2 endpoint (
/v2/embed). - New model:
embed-v4.0: Added Cohere’s latest embedding model with configurable output dimensions (256, 512, 1024, 1536) and 128k context window. - Cost tracking: Embeddings now track tokens and cost live in the tqdm progress bar and print a summary on completion. Each
EmbeddingResponseincludes atokens_usedfield. Updated pricing for all models. embed_synchelper: New synchronous convenience wrapper that returns a flat list of embedding vectors.- New docs page: Added Embeddings documentation with model table, examples, and configuration reference.
0.0.116 · 2026-02-17
Section titled “0.0.116 · 2026-02-17”- Claude Sonnet 4.6 support: Added
claude-4.6-sonnet(claude-sonnet-4-6) with $3/$15 pricing, GA structured outputs, image input, and reasoning support. Also added Bedrock entries for both 4.6 models (claude-4.6-opus-bedrock,claude-4.6-sonnet-bedrock). - Adaptive thinking default for all 4.6 models: Both Opus 4.6 and Sonnet 4.6 now default to
thinking: {type: "adaptive"}when no explicitthinking_budgetis set. Explicitbudget_tokensstill works but emits a deprecation warning. - GA effort parameter for Sonnet 4.6:
global_effortandreasoning_effortnow map tooutput_config.effortfor Sonnet 4.6 (previously only Opus 4.5/4.6). The-low,-medium,-highmodel name suffixes work as expected (e.g.claude-4.6-sonnet-medium). - Prefill blocking for Sonnet 4.6: Assistant message prefill is now rejected for all 4.6 models (was previously Opus-only), matching the upstream API behavior.
- Model aliases: Models can now define an
aliaseslist so common alternative names resolve to the same model. For example,claude-sonnet-4-6(the API name) now resolves toclaude-4.6-sonnet, andclaude-haiku-4.5resolves toclaude-4.5-haiku. Aliases work with reasoning suffixes too (e.g.claude-sonnet-4-6-high). All Anthropic models with differing API names have aliases configured. - Duplicate model registration warning:
register_model()now prints a warning when a model id or alias collides with an existing registry entry, catching configuration bugs at import time.
0.0.115 · 2026-02-12
Section titled “0.0.115 · 2026-02-12”- File and image URL passthrough:
FileandImageobjects now accept HTTP(S) URLs asdataand pass them directly to providers that support it (OpenAI Responses API, Anthropic, Google Gemini), avoiding unnecessary download and base64-encoding. Providers that don’t support URL passthrough (OpenAI Chat Completions for files, Mistral, Nova) automatically fall back to downloading and base64-encoding. - Fixed OpenAI Responses
input_filedeserialization:Conversation.from_openai_chat()now correctly parsesinput_fileblocks wherefile_url,file_data, andfilenameare at the top level of the block (as emitted by the Responses API), not nested inside a sub-dict.
0.0.114 · 2026-02-12
Section titled “0.0.114 · 2026-02-12”- Dropped
tiktokendependency: Removedtiktokenfrom project dependencies (pyproject.tomlandrequirements.txt) and switchedConversation.count_tokens()to a lightweight heuristic (len(text) // 4) for text token estimation. - Fixed OpenAI Responses usage parsing:
Usage.from_openai_usage()now supports both OpenAI shapes: Chat Completions (prompt_tokens/completion_tokens,prompt_tokens_details) and Responses API (input_tokens/output_tokens,input_tokens_details), including cache-read token extraction in both cases. - Added coverage for usage/cost correctness:
- Extended
tests/core/test_incomplete_response.pyto assert usage mapping for Responses payloads and added a direct compatibility test for both OpenAI usage formats. - Added a new live test
tests/models/test_gpt_5_2_responses_cost_live.pythat usesdotenv, callsgpt-5.2withuse_responses_api=True, and verifies non-zero usage/cost plus exact usage-to-cost reconciliation against model pricing.
- Extended
- Test cleanup: Minor formatting/type-hint cleanup in
tests/models/test_xhigh_reasoning.pyto keep lint/type checks clean.
0.0.113 · 2026-02-11
Section titled “0.0.113 · 2026-02-11”- Anthropic tool schemas strip unsupported constraints: Tool schemas sent to Anthropic now automatically remove numeric constraints (
minimum,maximum, etc.) that the API rejects, folding them into the property’sdescriptionso the model still sees the intent. This applies toTool.for_anthropic()(both strict and non-strict modes) and to raw dict tool definitions passed throughtools=, including nestedcustomtool schemas. Caller-provided dicts are never mutated.
0.0.112 · 2026-02-09
Section titled “0.0.112 · 2026-02-09”- JSON healing: escape unescaped interior quotes:
load_jsonnow detects and escapes double quotes inside JSON string values that weren’t properly escaped (e.g."the agent ("attorney-in-fact") is authorized"). Uses structural context — a quote only counts as a real string terminator if the next non-whitespace character is a JSON delimiter (,,},],:). Tried as a later fallback after bracket/comma healing to avoid false positives on valid JSON.
0.0.111 · 2026-02-08
Section titled “0.0.111 · 2026-02-08”- Immediate error on impossible token budget: Requests whose estimated token count (prompt +
max_new_tokens) exceedsmax_tokens_per_minutenow raise aValueErrorimmediately instead of hanging forever waiting for capacity that can never be granted.
0.0.110 · 2026-02-07
Section titled “0.0.110 · 2026-02-07”- Added full Claude Opus 4.6 request support in the Anthropic builder: adaptive thinking (
thinking: {type: "adaptive"}), 128k-style large-output compatibility plumbing,inference_geopassthrough, and Opus 4.6 assistant-prefill rejection behavior. - Migrated Opus effort handling to GA shape for both
claude-4.5-opusandclaude-4.6-opusviaoutput_config.effort(including support forglobal_effort="max"), removing reliance on the old effort beta header path. - Added compatibility passthrough for deprecated Anthropic
output_formatby mapping it tooutput_config.format, while preserving nativeoutput_config.formatstructured outputs. - Expanded proxy/server Anthropic compatibility models + adapters to parse/forward
output_config, deprecatedoutput_format, andinference_geo. - Added/updated regression coverage in
tests/models/test_anthropic_thinking_budget.py,tests/core/test_server_adapters.py, andtests/core/test_new_llmclient_api.py. - Added live network validation in
tests/one_off/test_anthropic_opus_46_features_live.py(loads.envviadotenv.load_dotenv()), covering Opus 4.5/4.6 effort, adaptive thinking,inference_geo, deprecatedoutput_format, and prefill rejection.
0.0.108 · 2026-02-04
Section titled “0.0.108 · 2026-02-04”- OpenAI Responses API now supports images in tool results: Tool results containing
[Text(...), Image(...)]lists are now properly serialized as arrays withinput_textandinput_imagetypes, allowing models to see images returned by tools natively. - Fixed OpenAI Chat Completions image extraction: Tool results with images now correctly append a user message containing the extracted images (was creating the message but not adding it to the request).
- Image detail field:
Image.oa_resp()now includes thedetailfield for controlling image processing fidelity.
0.0.107 · 2026-02-04
Section titled “0.0.107 · 2026-02-04”- Anthropic structured outputs are GA: Structured outputs now use
output_config.format(no beta header), and Claude 4.5 models are marked JSON-capable. - FilesystemManager zip URLs:
FilesystemManager.from_zip()now acceptshttp(s)URLs to preload a workspace from a remote zip file. - Curl tool jq piping:
get_curl_tool()now allows piping output tojqfor JSON filtering while continuing to block other shell pipes.
0.0.105 · 2026-02-02
Section titled “0.0.105 · 2026-02-02”- New
get_curl_tool()prefab: A lightweight curl tool for making HTTP requests without needing a full sandbox. Validates commands to prevent shell injection, whitelists safe flags, and blocks requests to localhost/private IPs. Pair it withFilesystemManagerfor agents that need to fetch data and read/write files. See Tool Use for usage. - Verbose mode for agent loops: Pass
verbose=Truetorun_agent_loop(),run_agent_loop_sync(),start_agent_loop_nowait(), or the batch variants to print each tool call and result as the agent runs. Long arguments and outputs are automatically truncated for readability.
0.0.104 · 2026-02-02
Section titled “0.0.104 · 2026-02-02”- Lazy imports for prefab tools: The
lm_deluge.tool.prefabmodule now uses lazy imports, so importing one tool (e.g.,ModalSandbox) no longer requires dependencies for unrelated tools (e.g.,lenlp/tantivyforFullTextSearchManager). Each tool’s dependencies are only loaded when that specific tool is used.
0.0.103 · 2026-01-31
Section titled “0.0.103 · 2026-01-31”- Model fallbacks, blocklisting, and stickiness: Multi-model clients now support intelligent fallback behavior. Configure
prefer_model="claude-4-sonnet"to always try your preferred model first with automatic failover, or useprefer_model="last"for multi-turn conversations to stick to whichever model was used previously (survives serialization viaconv.model_used). Models that fail with unrecoverable errors (401, 403, 404) get automatically blocklisted for the client’s lifetime, while rate limits and server errors trigger retries. Agent loops automatically maintain stickiness across tool-calling rounds. - Added new models:
kimi-k2.5(Moonshot),glm-4.7-flash-openrouter,trinity-large-openrouter, andkimi-k2.5-openroutervia OpenRouter. - Added pricing metadata to existing Kimi models (
kimi-k2,kimi-k2-turbo,kimi-k2-thinking,kimi-k2-thinking-turbo). - New documentation page: Model Fallbacks & Stickiness covering the three key patterns (primary + fallback, load balancing, multi-turn stickiness).
0.0.102 · 2026-01-29
Section titled “0.0.102 · 2026-01-29”- Added ZAI (Zhipu AI) models:
glm-4.7,glm-4.7-flash,glm-4.6,glm-4.5, andglm-4.5-airvia the ZAI API with Anthropic-compatible spec. - Replaced
fastmcpandmcpdependencies with a minimal built-in MCP client implementation (lm_deluge.mcp), reducing the dependency footprint while supporting HTTP and stdio transports for tool listing and calling. - Cleaned up JSON parsing: removed debug print statements from
try_load_json(). - Moved examples into proper documentation pages under
/docs/examples/covering batch processing, chat loops, computer use, and streaming.
0.0.101 · 2026-01-14
Section titled “0.0.101 · 2026-01-14”- Added Claude Code skill (
lm_deluge.skill.SKILL.md) with embedded usage documentation for Claude Code IDE integrations. - Improved aiohttp client connector error messages with clearer diagnostics when connection fails.
- Updated Slack notification formatting for better readability.
0.0.100 · 2026-01-10
Section titled “0.0.100 · 2026-01-10”- Fixed OpenAI Responses API handling of reasoning models with tools: reasoning content (summary blocks) now correctly serializes as
reasoningtype items with propersummaryfield structure, fixing issues where tool calls would fail on models likeo4-mini. - Added test coverage for Responses API reasoning + tool interactions in
tests/core/test_openai_responses_reasoning_tools.py.
0.0.99 · 2026-01-10
Section titled “0.0.99 · 2026-01-10”Bugfixes for OpenAI responses API + client-side tools.
0.0.98 · 2026-01-10
Section titled “0.0.98 · 2026-01-10”- OpenAI Responses API now supports client-side tool execution: when you pass
Toolobjects (or local MCP servers withforce_local_mcp=True) to start, start_nowait, process_prompts_async, etc. the client automatically runs an internal tool loop—calling your tools, collecting results, and continuing until the model finishes. Although this basically means we’re running an agent loop (which there are already methods for), this brings it into parity with how the Responses API works when you don’t provide tools that run client-side (you get back a completed response, with all the tool calls and reasoning that led to it). We decided that should match whether or not you have client-side tools, so JUST for the Responses API, we auto-run tools. - Responses API response parsing now preserves raw item payloads (
raw_item) inToolCall.extra_bodyfor function calls, MCP calls, web search, and other built-in tools, making it easier to reconstruct the exact request format when needed. Thinkingparts from Responses API now includesummaryandraw_payloadfields for richer introspection.- Agent loops (
run_agent_loop) now raiseNotImplementedErrorwhenuse_responses_api=Trueto prevent confusion—usestart()instead, which handles the tool loop automatically. - Added test coverage for Responses API tool call handling in
tests/core/test_openai_responses_tool_calls.py.
0.0.97 · 2026-01-10
Section titled “0.0.97 · 2026-01-10”- Fixed GPT-5 reasoning effort defaults: GPT-5 models no longer special-case to
minimaleffort when none is specified; they now follow the standardlowdefault like other reasoning models. - Enabled JSON mode support (
supports_json: True) for GPT-5 Codex variants (gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5-codex) andgpt-5-chat-latest. - Updated Cerebras model catalog: added
glm-4.7-cerebras(ZAI GLM 4.7); temporarily disabled preview models (llama-4-scout,llama-4-maverick,qwen-3-235b-thinking,qwen-3-coder) pending availability.
0.0.96 · 2026-01-07
Section titled “0.0.96 · 2026-01-07”- Agent loops now accept an
on_round_completecallback onrun_agent_loop()/run_agent_loop_sync()/start_agent_loop_nowait()(and batch agent loop helpers) for per-round hooks. - New tool execution helpers:
execute_tool_calls()(plusTool.find()) for runningToolCalls locally and collecting(tool_call_id, result)tuples. - Conversation ergonomics:
Conversation.with_tool_results()for adding tool outputs in bulk, andwith_tool_result()/Message.with_tool_result()now accept dict results. - Added core test coverage for agent loop callbacks and the new tool helper utilities.
0.0.95 · 2026-01-06
Section titled “0.0.95 · 2026-01-06”- Added
PhilipsHueManagerprefab (lm_deluge.tool.prefab) for controlling Philips Hue lights via the local bridge API (list lights, on/off, color, brightness;HUE_BRIDGE_IP+HUE_API_KEY). - Added an experimental
lm_deluge.pipelines.heartbeatstarter for running a model on a schedule. - Added a one-off live test for
PhilipsHueManager(tests/one_off/test_philips_hue_live.py).
0.0.94 · 2026-01-01
Section titled “0.0.94 · 2026-01-01”- Added
get_response_files()tolm_deluge.util.anthropic_filesto download Anthropic response files in-memory (optionally resolving real filenames via metadata). - Anthropic requests now populate
APIResponse.finish_reasonfromstop_reason. Message.user(..., file=...)now accepts aFileobject directly.- Added a one-off regression test for Anthropic
finish_reasonparsing (tests/one_off/test_anthropic_finish_reason.py).
0.0.93 · 2026-01-01
Section titled “0.0.93 · 2026-01-01”- Added Anthropic Skills support: pass
skills=[Skill(...)]tostart(),run_agent_loop(), or batch methods to use Anthropic’s built-in skills (xlsx, pptx) or custom uploaded skills. - New
Skillclass (lm_deluge.Skill) for defining skills withtype(anthropic/custom),skill_id, andversion. - File download utilities in
lm_deluge.util.anthropic_files:download_anthropic_file(),save_response_files(),get_anthropic_file_metadata()for retrieving files generated by skills. ToolResultnow includes afilesfield for code execution outputs, andContainerFileTypedDict for file metadata.- Container ID reuse:
container_idparameter onstart()/run_agent_loop()and automatic reuse within agent loops to maintain state across turns. - Skills documentation page added to the docs site.
0.0.92 · 2026-01-01
Section titled “0.0.92 · 2026-01-01”- Added Amazon Nova support on Bedrock (new request handler, model registry entries, prompt/tool/image serialization, and cache-point handling).
- Expanded Azure catalog with OpenAI-compatible model definitions, dotenv-aware
AZURE_URLlookup, and Responses API enabled only for OpenAI family models. - Added Tavily + Brave web search managers plus a configurable WebSearchManager to mix search/fetch backends.
- Tavily extract now guarantees markdown output by converting HTML responses with markdownify when needed.
- New one-off test suites for Nova Bedrock, Azure models, and Tavily/Brave/WebSearchManager coverage.
0.0.91 · 2025-12-28
Section titled “0.0.91 · 2025-12-28”- CLI overhaul: installable
delugeanddeluge-serverentrypoints withlist,run, andagentsubcommands, model filtering, JSON output, stdin/file inputs, image prompts, and MCP/prefab-enabled agent loops. - Model registry now tracks provider +
supports_imagesmetadata and exposesfind_models()for filtering/sorting by capabilities and cost. - OpenRouter catalog expanded with NVIDIA Nemotron 3 Nano 30B (free/paid), Nemotron Nano 12B v2 VL (free/paid vision), Mistral Devstral 2 (free/paid), Xiaomi Mimo V2 Flash (free), AllenAI OLMo 3.1 32B Think (free), and a Trinity Mini free SKU. Removed retired Anthropic models.
- Prompt refactor:
Conversation.system/userare now instance methods (useConversation().system(...).user(...)), addedConversation.ai, and prompt primitives (File,Image, etc.) live underlm_deluge.promptwith top-level re-exports;RequestContextmoved tolm_deluge.api_requests.context. - MCP tooling adds
MCPServer.from_mcp_configfor Claude Desktop config parsing, andMCPServeris now exported at the top level. - Dependencies trimmed: removed numpy/pandas; embedding
stack_results()now returns Python lists only; logprob utilities usemath. - Config cleanup: dropped
SamplingParams.to_vllmandComputerUseParams. - Docs and repo hygiene: added proxy server docs + nav entry, refreshed README/examples for the new Conversation builder, added lint helper scripts (banned strings/weird spaces/max lines).
0.0.90 · 2025-12-26
Section titled “0.0.90 · 2025-12-26”- Proxy server adds configurable model policy (allowlists, defaults, alias routes) with CLI/config support, optional request/provider logging, forwarded
anthropic-betaheaders, and richer Anthropic request support (thinking config, expanded content blocks). - Added thought signature preservation for Gemini 3 and Anthropic responses (including redacted thinking), with updated adapters and tests.
- Sandbox prefabs reorganized into a package and expanded with a macOS-only SeatbeltSandbox and coverage for the new sandbox flows.
- Added
tinker://OpenAI-compatible model auto-registration with multipart message flattening. - Message/Conversation
to_logandfrom_logcan optionally preserve image/file bytes (base64) for round-trip serialization. - OpenRouter catalog expands with
minimax-m2.1plus freegpt-oss-20b/gpt-oss-120bentries. - Provider compatibility fixes: Anthropic batch submission now posts JSON payloads (no temp JSONL), and Gemini tool schemas strip
additionalProperties.
0.0.89 · 2025-12-17
Section titled “0.0.89 · 2025-12-17”- Added
gemini-3-flash-previewmodel with v1alpha API endpoint and pricing ($0.50/$3.0 per million tokens input/output). - Gemini 3 Flash supports
minimalandmediumthinking levels directly, unlike Gemini 3 Pro which only supportslowandhigh. The request builder now detects Flash vs Pro and passes the appropriatethinkingLevelvalues. - Added test coverage for Flash-specific thinking levels in
tests/models/test_gemini_3_thinking_level.py.
0.0.88 · 2025-12-15
Section titled “0.0.88 · 2025-12-15”- New Recursive Language Model (RLM) manager/pipeline brings a tool-driven REPL for very long contexts, with persistent state, guarded imports,
lm()fan-out, andfinal()/final_var()completion; covered by new core and long-context suites (including a 1.5M-char Ulysses run). - Added a Tantivy-powered FullTextSearch prefab (
search/fetchtools) with query sanitization, optional dedupe, cached fetches, and a BrowseComp-Plus benchmark harness to stress it against ~100k docs. - Expanded sandbox prefabs: Modal, Daytona, Docker, and Fargate sandboxes now expose bash/file/process/tunnel helpers with async context managers and background process tracking, plus one-off tests for Docker/Daytona cleanup paths.
- Packaging: introduced optional extras
full_text_search(tantivy+lenlp) andsandbox(modal, daytona-sdk, docker) so heavyweight deps are opt-in; removed the unuseddeduplicate_strategyparameter from the FTS API.
0.0.87 · 2025-12-11
Section titled “0.0.87 · 2025-12-11”- Added
xhighreasoning effort support for GPT-5.2 and GPT-5.1-Codex-Max, the two models that support OpenAI’s new extra-high reasoning tier. Other reasoning models automatically fall back tohighwith a warning. - Model name suffixes now support
-xhigh(e.g.,gpt-5.2-xhigh) alongside the existing-low,-medium,-high,-minimal, and-nonesuffixes. - Fixed GPT-5.2 and GPT-5.1-Codex-Max requests to omit
temperatureandtop_pwhen reasoning is enabled, matching OpenAI’s new API constraints for these models. - Added
supports_xhighflag toAPIModelfor models that support the xhigh reasoning tier. - Added comprehensive test coverage in
tests/models/test_xhigh_reasoning.py.
0.0.86 · 2025-12-04
Section titled “0.0.86 · 2025-12-04”- Fixed critical bug in agent loop where
conversation.with_tool_result()wasn’t being reassigned, causing tool results to be silently dropped from the conversation history. - OpenAI web search tool now defaults to GA mode (
preview=False) instead of preview.
0.0.85 · 2025-12-04
Section titled “0.0.85 · 2025-12-04”- Added
max_content_charsparameter toExaWebSearchManagerfor controlling response size. - Enhanced OpenAI built-in web search tool with better configuration options.
- Added comprehensive test coverage for OpenAI web search in
tests/core/test_openai_web_search.py.
0.0.84 · 2025-12-04
Section titled “0.0.84 · 2025-12-04”- Added TryCua integration for computer use agents, with full executor implementation supporting screenshots, clicks, typing, scrolling, and multi-step tasks.
- Added Anthropic built-in web search tool support with test coverage.
- Added Gemini computer use via Kernel executor with dedicated test suite.
- Added batch agent loops capability for running multiple agent conversations in parallel.
- Registered new Gemini models including
gemini-2.5-proandgemini-2.5-flash.
0.0.83 · 2025-12-02
Section titled “0.0.83 · 2025-12-02”- More prefab tools: added Google Docs tools (metadata, ranged reads/grep, markdown-aware insert/update/replace/delete),Google Sheets tools (list tabs, find used ranges, read ranges as HTML tables, update cells) with service-account auth. Added Exa Web Search tool (more web search tools coming), AWS SES Email tool, S3-backed filesystem and memory.
- Model catalog: registered Arcee
trinity-mini(native, OpenRouter, Together), refreshed DeepSeek pricing plus new reasoner/speciale variants (including an Anthropic-compatible path), and marked Kimi thinking SKUs as reasoning models with a warning when thinking is disabled. - Client knobs & coverage:
LLMClientnow acceptsglobal_effortandthinking_budgetat construction so Anthropic-style requests carry the right effort settings, and new suites cover the prefab tools, Arcee tool-calling, DeepSeek Speciale, and S3 integrations.
0.0.82 · 2025-11-30
Section titled “0.0.82 · 2025-11-30”- Added
_LLMClient.print_usage()and refactoredStatusTracker.log_usage()so you can dump cumulative token/cost/time stats mid-run; final status output now reuses the same usage reporter. - Drafted a GEPA pipeline implementation plan (
src/lm_deluge/pipelines/gepa/GEPA_IMPLEMENTATION_PLAN.md) outlining how to port the GEPA optimizer onto lm-deluge.
0.0.81 · 2025-11-26
Section titled “0.0.81 · 2025-11-26”- Tooling overhaul:
Tool.from_functionnow uses PydanticTypeAdapterfor schemas, supportsAnnotated[...]descriptions, extracts return-typeoutput_schema(with optional runtime validation), and auto-converts TypedDict/Pydantic params. Serialization still honors strict/non-strict modes automatically. - New prefab helpers:
ToolComposer(OTC) for code-based tool orchestration,BatchToolfor bundling calls,ToolSearchToolfor regex discovery + invocation, andMemoryManagerfor long-lived notes. Todos/subagents/filesystem managers stay available underlm_deluge.tool.prefab. - Pipelines split:
extract,translate, andscore_llmnow live inlm_deluge.pipelines. - Modal sandbox
bashdrops timeouts for background commands and exposesbash/list_processes/get_url(network optional); docs updated accordingly. - Agent ergonomics:
Conversation.print()pretty-prints conversations with truncation, and Open Tool Composition prompts now render available tool signatures correctly. - Robustness: Anthropic requests now map
global_efforttooutput_config.effort, and aiohttpServerDisconnectedErrorsurfaces a structuredAPIResponseinstead of an exception.
0.0.80 · 2025-11-24
Section titled “0.0.80 · 2025-11-24”- Added
global_efforttoSamplingParamsand Anthropic request wiring soclaude-4.5-opussends the neweffortfield plus beta header automatically. - Exposed
thinking_budgetonSamplingParamsand made it take precedence overreasoning_effortfor Anthropic and Gemini reasoning models (with warnings to flag overlaps); Gemini flash-lite enforces its minimum budget. - Fixed Gemini 3 request construction to always send
generationConfig.thinkingConfigand remappedreasoning_effort="medium"/Noneto the provider-supported thinking levels. - Default temperature raised to
1.0across docs and config defaults to match current provider behavior. - Added regression suites for Anthropic thinking budgets and Gemini reasoning/effort mapping (
tests/models/test_anthropic_thinking_budget.py,tests/models/test_gemini_thinking_config.py,tests/models/test_gemini_3_thinking_level.py).
0.0.79 · 2025-11-22
Section titled “0.0.79 · 2025-11-22”- Gemini 3 requests now send
thinkingLevel="low"when callers specifyreasoning_effort="none"or"minimal", avoiding unexpected high-effort reasoning (and cost) when users explicitly ask for lightweight runs. - Documented the new sandbox utilities (
ModalSandboxandDaytonaSandbox) so agents can execute commands in managed remote environments with optional network blocking, stdout capture, file I/O, and preview tunnels.
0.0.78 · 2025-11-19
Section titled “0.0.78 · 2025-11-19”- Fixed
FilesystemManager.read_fileso empty files no longer throw range errors when agents omitend_line; blank files now return an empty snippet and accurate metadata instead of failing mid-run. - Added regression coverage in
tests/test_filesystem.py::test_filesystem_manager_reads_empty_filesto lock the behavior down.
0.0.77 · 2025-11-19
Section titled “0.0.77 · 2025-11-19”- Added
FilesystemManager, an in-memory virtual workspace + tool wrapper that gives agents sandboxedread_file/write_file/list_dir/grep/apply_patchcapabilities without touching the host filesystem; the implementation lives inlm_deluge.tool.prefab.filesystem. - Landed regression coverage in
tests/test_filesystem.pyplus a scripted live scenario intests/test_filesystem_live.pyso refactors keep the tool contract intact. - Documented the new manager throughout the README, feature guide, and API reference so it is easy to wire into existing agent loops (including tips on seeding backends, exporting workspaces, and disabling commands per session).
0.0.76 · 2025-11-18
Section titled “0.0.76 · 2025-11-18”- Introduced
SubAgentManager, a trio of tools (start_subagent,check_subagent,wait_for_subagent) that lets a primary agent delegate work to cheaper models; real-world coverage lives intests/core/test_subagent_manager.pyand the new Agent guide sections spell out the workflow. - Shipped
TodoManager/TodoItem/TodoStatus/TodoPriority, giving LLMs a first-class todo scratchpad they can mutate viatodowrite/todoread; the integration suite intests/core/test_todo_manager.pyensures models follow the protocol. _LLMClientnow exposesstart_agent_loop_nowait()+wait_for_agent_loop()around a newAgentLoopResponse, so you can launch parallel loops and gather the(Conversation, APIResponse)later;tests/core/test_agent_loop.pyadds scenarios for concurrent loops and the docs (features, agents guide, API reference) walk through the new APIs.
0.0.75 · 2025-11-16
Section titled “0.0.75 · 2025-11-16”output_schemanow accepts raw JSON Schemas or PydanticBaseModelsubclasses.lm_deluge.util.schema.prepare_output_schema()handles the conversion to strict JSON Schema (addsadditionalProperties: false, expands$defs, keeps optional fields nullable, etc.) and feeds both Anthropic and OpenAI builders, with coverage intests/core/test_schema_transformations.pyandtests/core/test_pydantic_structured_outputs.py.- Anthropic/OpenAI structured output requests now share the same normalization path so provider quirks stay isolated—unsupported Anthropic constraints move into descriptions while OpenAI keeps the tight grammar untouched. Regression suites for the chat and Responses APIs plus new real-run harnesses (
tests/one_off/test_anthropic_structured_outputs_real.py,tests/one_off/test_openai_structured_outputs_real.py) make sure the wiring keeps working. - Shipped
examples/pydantic_structured_outputs_example.pyand refreshed the structured outputs docs so teams can drop a Pydantic model intoLLMClient.process_prompts_*()without hand-rolling schemas or worrying about mutation.
0.0.74 · 2025-11-15
Section titled “0.0.74 · 2025-11-15”- Structured outputs landed across Anthropic and OpenAI:
LLMClient(..., output_schema=...)now pushes the JSON Schema to Claude (complete with thestructured-outputs-2025-11-13beta and strict-tool gating) and to both OpenAI chat and Responses API requests, with schema precedence overjson_modeeverywhere. - Tightened tool serialization so strict schemas only turn on when providers actually support it (Bedrock always forces non-strict) and made MCP-backed OpenAI Responses runs share the same strict/non-strict behavior; covered by fresh suites in
tests/core/test_openai_structured_outputs.pyandtests/core/test_bedrock_requests.py. process_prompts_sync()forwardsoutput_schema, and the new regression test (tests/core/test_process_prompts_sync.py) ensures future changes keep the sync/async surfaces aligned.- Added one-off real API coverage for OpenAI structured outputs plus a battery of deterministic unit tests so regressions in schema handling or strict tooling are caught automatically.
0.0.73 · 2025-11-13
Section titled “0.0.73 · 2025-11-13”- Added the GPT-5.1 family (standard, Codex, Codex Mini) with pricing metadata and marked them as reasoning models so they Just Work with
LLMClient. - Extended reasoning suffix parsing to accept
-minimaland-none, enforced that Codex variants must run against the Responses API, and added guard rails that convert unsupported efforts to the closest valid value with clear warnings. - Updated the OpenAI request builders plus the warning system so GPT-5.1 downgrades from
minimaltononetransparently while older models downgrade tolow, and added coverage for the new models (tests/models/test_gpt_5_1.py).
0.0.72 · 2025-11-11
Section titled “0.0.72 · 2025-11-11”- Background requests now honour
request_timeoutprecisely: polling uses a monotonic clock, cancels the remote response before erroring, and surfaces a structured timeoutAPIResponseinstead of hanging jobs. - Cancellation is best-effort logged when failures happen so you can trace leaked jobs during debugging.
0.0.71 · 2025-11-10
Section titled “0.0.71 · 2025-11-10”Conversation.from_openai_chat()now filters out whitespace-only text blocks and skips empty messages so bad payloads from upstream providers no longer crash tool execution.MockAsyncOpenAIdoes a real conversion from OpenAI tool definitions into lm-delugeToolobjects, wires them throughLLMClient.start(), and carries the activeCachePattern, so you can run copilot-style tools under tests without custom glue.- Added a focused test suite for the mock client (
tests/test_mock_openai.py) that exercises the OpenAI-compatible surface area.
0.0.70 · 2025-11-09
Section titled “0.0.70 · 2025-11-09”- Packaging now re-exports AsyncOpenAI-style exception classes (
APIError,APITimeoutError,BadRequestError,RateLimitError) so verifier harnesses can catch them directly fromlm_deluge. MockAsyncOpenAIgained full parity with the officialAsyncOpenAIsignature: you can passapi_key,organization,project, custom base URLs, and call the legacy.completions.create()path in addition to chat completions.- Added an async
close()noop for compatibility together with extensive tests to ensure verifier integrations behave as expected.
0.0.69 · 2025-11-09
Section titled “0.0.69 · 2025-11-09”- Introduced the optional
lm-deluge[openai]extra and shipped the first cut ofMockAsyncOpenAI, giving you an on-device OpenAI-compatible client backed byLLMClient. - Registered the first Moonshot/Kimi (
kimi-k2,kimi-k2-turbo,kimi-k2-thinking,kimi-k2-thinking-turbo) and MiniMax (minimax-m2) models so you can swap between those providers without custom API wrappers. - Added regression tests for the new models (
tests/models/test_kimi_and_minimax.py) to make sure they stay callable.
0.0.67 · 2025-10-31
Section titled “0.0.67 · 2025-10-31”- Hardened
OpenAIResponsesRequest.handle_response()so truncated/incomplete streaming payloads now produce actionable error messages (with the provider’sincomplete_details) instead of JSON parsing failures, and ensured the dangling await in the OpenAI client path is fixed. - Added dedicated coverage in
tests/core/test_incomplete_response.pyfor both the incomplete and the successful response paths.
0.0.66 · 2025-10-31
Section titled “0.0.66 · 2025-10-31”- When you pass MCP server dictionaries (with a
urlkey) throughtoolsfor Anthropic models, the client now automatically moves them into themcp_serversarray and sets the right beta header, so Anthropics’ MCP integration works without any manual request massaging.
0.0.65 · 2025-10-30
Section titled “0.0.65 · 2025-10-30”- Tightened the strict-mode JSON Schema generator for tools: when
strict=True, nested object schemas (including those inside$defs) haveadditionalProperties: false, defaults are stripped, and every property is markedrequired, matching OpenAI’s schema contract. - Backed the change with new tests in
tests/core/test_tool_defs.pyto ensure tools with and without$defsserialize correctly.
0.0.64 · 2025-10-30
Section titled “0.0.64 · 2025-10-30”- Added first-class
$defs/definitionssupport toToolplus the MCP loader so complex tool schemas with references survive serialization. Tool.for_openai_completions()now automatically includes$defs, rejects schemas that can’t run in strict mode, and setsadditionalProperties: falseso OpenAI’s strict JSON schema validation passes out of the box.
0.0.63 · 2025-10-30
Section titled “0.0.63 · 2025-10-30”SamplingParamsandLLMClientacceptreasoning_effort="minimal"(and"none") so you can target the more efficient reasoning tiers exposed by OpenAI without hand-editing objects.- Added regression coverage in
tests/core/test_reasoning_effort_minimal.py.
0.0.62 · 2025-10-23
Section titled “0.0.62 · 2025-10-23”Message.with_file()/add_file()now accept existingFileobjects, letting you build up prompts from pre-signed files without duplicates.- Added
Message.with_remote_file()to turn local bytes/paths into provider-hosted files asynchronously (with provider guard rails), making it easy to keep Anthropic/OpenAI file references in sync when constructing conversations.
Looking for something older? Run git log --oneline or inspect the GitHub release feed—this page will continue to backfill as new releases ship.