Changelog

LM Deluge iterates quickly. This page calls out the highlights from the most recent releases. For a blow-by-blow history you can always inspect git log, but the sections below should help you catch up quickly starting with v0.0.62.

0.0.140 · 2026-05-29

Claude Opus 4.8: Added claude-4.8-opus for the Anthropic API plus claude-4.8-opus-bedrock and claude-4.8-opus-bedrock-global for Amazon Bedrock. Opus 4.8 uses the Claude 4.7+ request path for adaptive thinking, output_config.effort, xhigh, task budgets, and sampling-parameter stripping. Live Bedrock audits passed all configured US regions and all 30 configured global regions after excluding me-south-1, which timed out empirically.
Claude Opus 4.7 and Bedrock global profiles: Added claude-4.7-opus and its US/global Bedrock profiles, including request-builder support for adaptive thinking, summarized thinking display, effort defaults, task budgets, assistant-prefill rejection, and 4.7 sampling constraints.
New model families: Added Gemini 3.5 Flash (gemini-3.5-flash and OpenAI-compatible variant), GPT-5.5, DeepSeek V4 Pro/Flash, Kimi K2.6 through both Moonshot and OpenRouter, NVIDIA hosted NIM models/dynamic nvidia: model IDs, Perceptron (perceptron-mk1, isaac-0.2-2b, sonar) with image/video support, and a native Moondream provider plus MoondreamClient.
Spatial utilities: Refactored lm_deluge.util.spatial into a package with shared point/box/count parsing, drawing helpers, and provider-specific parsers for Gemini, Moondream, and Perceptron. Added focused spatial tests for normalization and provider output parsing.
Bedrock auth hardening: Replaced requests-aws4auth with an internal Python 3.10-compatible AWS SigV4 header signer that works for Bedrock and straightforward AWS services such as S3. Bedrock Anthropic, OpenAI-compatible, and Nova request paths now sign aiohttp requests directly; API-key auth still takes precedence when present.
Dependency and secret-loading cleanup: Removed python-dotenv, requests, and requests-aws4auth from core/runtime dependency paths. Library code and tests now expect credentials from the process environment instead of loading .env files implicitly, reducing surprise secret loading in production.
Prompt media and URL handling: Added a Video prompt part and shared async URL byte-fetching helpers so providers that require inline media can fetch and encode images/video without the requests dependency.
Rate-limit defaults: Raised the default token-per-minute budget from 100_000 to 500_000, reducing failures for prompts that naturally exceed the old default.
Cloudflare sandbox prefab: Added the Cloudflare Worker sandbox assets and prefab plumbing for lightweight remote command execution.

0.0.139 · 2026-04-23

NVIDIA provider: Added a first-class NVIDIA hosted NIM provider using https://integrate.api.nvidia.com/v1 with NVIDIA_API_KEY. Includes curated registry entries for models such as minimax-m2.7-nvidia, deepseek-3.2-nvidia, kimi-k2.5-nvidia, glm-5.1-nvidia, gpt-oss-120b-nvidia, and sarvam-m-nvidia.
Dynamic NVIDIA catalog models: You can now use arbitrary hosted catalog slugs via the nvidia: prefix, for example nvidia:z-ai/glm-5.1 or nvidia:openai/gpt-oss-120b, without waiting for a registry update.
OpenAI-compatible passthrough fix: extra_body is now merged into OpenAI-compatible chat-completions requests, which is required for provider-specific knobs like NVIDIA’s chat_template_kwargs.

0.0.138 · 2026-04-08

Cloudflare Workers AI provider: New provider with 6 models — Kimi K2.5, GLM-4.7-Flash, GPT-OSS-120B, Llama 4 Scout, Gemma 4 26B, and Nemotron 3 120B. Uses Cloudflare’s OpenAI-compatible endpoint with a thin custom handler for account ID injection. Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID env vars. Model IDs are suffixed -cf (e.g. kimi-k2.5-cf, llama-4-scout-cf).
Gemini batch processing: Added submit_batches_gemini for Gemini’s native batch API with file upload, polling, and result retrieval. Integrates with the existing submit_batch_job / wait_for_batch_job client methods.
GPT-5.4 mini and nano: Added gpt-5.4-mini ($0.75/$4.50) and gpt-5.4-nano ($0.20/$1.25) to the OpenAI registry with xhigh reasoning and verbosity support.
Arcee Trinity models: Added trinity-large-thinking and trinity-large-preview reasoning models from Arcee AI.

0.0.136 · 2026-03-17

GPT-5.4 mini and nano: Added gpt-5.4-mini and gpt-5.4-nano to the OpenAI registry with the same reasoning and capability handling as gpt-5.4, plus the new pricing tiers. Added a live one-off smoke test that exercises both models against the OpenAI API and verifies cost accounting.

0.0.135 · 2026-03-09

Audio file type inference: When passing raw bytes to transcribe_async / transcribe_sync, the library now uses filetype to detect the actual audio format (mp3, ogg, flac, etc.) instead of blindly assuming WAV. A new AudioSource type alias also accepts (bytes, filename) tuples so callers can preserve the original format and extension when the bytes come from an in-memory source.

0.0.133 · 2026-03-08

Audio transcription: New lm_deluge.transcribe module — a lightweight, standalone transcription client (like embed). Supports OpenAI (whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe), Mistral (voxtral-mini-latest), Fireworks (whisper-v3, whisper-v3-turbo), and Deepgram (nova-3, nova-2). Parallel batch transcription with rate limiting, retries, cost tracking, and progress bars. Use transcribe_async / transcribe_sync with file paths, Path objects, or raw bytes.
Auto-splitting for long audio: When a file exceeds a model’s duration or file-size limit, the module automatically splits it into chunks via ffmpeg, transcribes the chunks in parallel, and stitches the results back together — timestamps, segments, and words are all adjusted to reflect the original file’s timeline. If ffmpeg isn’t installed and the file is within limits, it sends directly; if the file exceeds limits and ffmpeg is missing, a clear error message tells the user what to do.
TranscriptionResponse: Results include .text, .language, .duration, .segments (list of TranscriptionSegment with start/end/speaker), and .words. Duration and language are filled in from ffprobe when the API doesn’t return them (e.g. gpt-4o-transcribe models).

0.0.131 · 2026-03-05

GPT-5.4 models: Added gpt-5.4 ($2.50/$15) and gpt-5.4-pro ($30/$180) — OpenAI’s latest reasoning models with image support and xhigh reasoning effort.
Verbosity parameter: New verbosity param on LLMClient and SamplingParams as a unified alias for output effort control. Maps to OpenAI’s native verbosity API field (low/medium/high) on GPT-5+ models, and to Anthropic’s output_config.effort on Claude 4.5+/4.6 models. Cross-provider values are normalized automatically (e.g. "max" → "high" for OpenAI with a warning). verbosity and global_effort stay synced — setting one sets the other, and providing conflicting values raises ValueError.
Lazy effort defaults: global_effort no longer defaults to "high" in SamplingParams. Instead, the Anthropic request builder applies the "high" default only for models that support GA effort, avoiding unnecessary params for other providers.
Warning cleanup: Anthropic’s json_mode-without-schema warning now fires once via maybe_warn instead of printing every time. New warning keys for unsupported verbosity/effort combinations.
Proxy server verbosity: The OpenAI-compatible proxy server now accepts and forwards the verbosity field in chat completion requests.

0.0.130 · 2026-03-03

Session sharing: Batch requests (process_batch, batch_agent_loop) now share a single aiohttp.ClientSession across all tasks, reducing connection overhead and improving throughput for large batches. The connector pool is sized with headroom above max_concurrent_requests so aiohttp never becomes the hidden bottleneck.
Improved rate-limit dispatch: Replaced _wait_for_capacity with _consume_rate_capacity — rate-limit waiting now happens before acquiring the concurrency semaphore, fixing head-of-line blocking for mixed-size requests. Wait times are computed dynamically from RPM/TPM deficit instead of fixed sleeps. Requests that can never fit the TPM budget raise immediately with a clear error instead of hanging forever.
Graceful exception handling in batches: process_batch and batch_agent_loop now use return_exceptions=True in asyncio.gather, so a single failing task no longer cancels all siblings. Exceptions are converted to error APIResponse objects, giving callers a uniform list back.

0.0.129 · 2026-03-03

Gemini 3.1 Flash Lite: Added gemini-3.1-flash-lite-preview (native Gemini API) and gemini-3.1-flash-lite-compat (OpenAI-compatible endpoint). Google’s most cost-efficient multimodal model — $0.25/M input, $1.50/M output. Supports thinking, structured outputs, and function calling (native endpoint only; the OpenAI-compat endpoint doesn’t support tool calling for Gemini 3.x due to thought signature requirements).

0.0.128 · 2026-03-03

Dynamic-filtering web search tool: Added web_search_tool_dynamic() for Anthropic’s new web_search_20260209 tool, which lets Claude write and execute code to filter search results before they enter the context window. Only supported on Claude Opus 4.6 and Sonnet 4.6 — raises ValueError for unsupported models. The API auto-injects the code execution tool, so you don’t need to pass it separately. The existing web_search_tool() (web_search_20250305) remains available for all models.
user_location for web search: Both web_search_tool() and web_search_tool_dynamic() now accept a user_location dict to localize search results by city, region, country, and timezone.
Graceful ClientOSError handling: aiohttp.ClientOSError (connection reset, broken pipe, etc.) now returns a structured error APIResponse instead of an unhandled exception traceback.

0.0.127 · 2026-03-02

Bedrock API key authentication: Bedrock now supports the new AWS Bedrock API keys alongside the existing SigV4 (access key + secret) auth flow. Set BEDROCK_API_KEY, AWS_BEDROCK_API_KEY, or AWS_BEARER_TOKEN_BEDROCK to use simple Bearer-token auth — no requests-aws4auth dependency needed. When both an API key and IAM credentials are present, the API key takes precedence. Auth logic consolidated into a shared bedrock_auth module, removing duplication across the Anthropic, OpenAI, and Nova Bedrock request builders.
JustBashSandbox: New cross-platform sandbox powered by Vercel’s just-bash. Provides read-scoped, network-disabled, copy-on-write execution without Docker or platform-specific tooling. Supports configurable root_dir, working_dir, read-only mode, optional Python via enable_python, auto-install of the just-bash CLI, background process tracking, and the same bash / list_processes tool interface as other sandboxes. See Tool Use for usage.

0.0.126 · 2026-02-27

Inception (Mercury 2) support: Added mercury-2 model from Inception Labs — a fast diffusion-based LLM with 128K context. Uses the OpenAI-compatible endpoint at api.inceptionlabs.ai. Supports tool calling, structured outputs, and reasoning_effort with Mercury-specific mapping ("none" / "minimal" → "instant" mode for near-zero-latency responses). Requires INCEPTION_API_KEY env var.

0.0.125 · 2026-02-27

Bedrock Updates: Anthropic Bedrock provider deprecated older models that aren’t supported, added more US regions for US cross-region inference, and added support for Global cross-region inference via ‘-global’ models that work across many regions for Claude 4 onwards. This allows much higher TPM/RPM compared to previous setup (only using us-west). Global is opt-in by choosing ‘-global’-suffixed models. Make sure to enable all regions in your AWS account to avoid errors with disabled regions.

0.0.124 · 2026-02-26

Automatic prompt caching: New cache="automatic" pattern that sets the top-level cache_control flag on Anthropic requests, letting the provider decide what to cache instead of manually specifying cache breakpoints. Supported in LLMClient, run_agent_loop, and the proxy server (DELUGE_CACHE_PATTERN=automatic). Bedrock does not support this mode and will emit a warning and fall back to no caching if automatic is requested.
PybubbleSandbox: New Linux sandbox backed by pybubble (bubblewrap). Provides filesystem and process isolation without Docker — just needs bwrap installed. Supports configurable network access (network_access, outbound_access, allow_host_loopback), optional fallback to host-network sharing in restricted runtimes (requires allow_host_loopback=True), background process tracking, and the same bash / list_processes tool interface as other sandboxes. Requires the pybubble dependency and Linux. See Tool Use for usage.

0.0.123 · 2026-02-25

Agent loop final-turn warning: When run_agent_loop reaches its last round (max_rounds), a user message is now injected telling the model it must return a text response and cannot call any more tools. This prevents agents from wasting the final turn on tool calls that will never be executed, making SubAgentManager and agent loops in general more reliable.

0.0.122 · 2026-02-24

New VectorDBManager prefab tool: In-process vector database for agents, backed by numpy with brute-force cosine similarity search. Exposes insert, search, get, delete, count, and list commands. Supports a pluggable VectorDBBackend ABC so you can swap in heavier stores (USearch, turbopuffer, etc.) without changing tool wiring. Ships with InProcessVectorDB for small-to-medium collections (~100k vectors). Available via the new vector_db optional extra (pip install lm_deluge[vector_db]).
Retry-After header support for rate limiting: All providers (Anthropic, OpenAI, Gemini, Bedrock, Bedrock Nova, Mistral) now parse retry-after and retry-after-ms response headers on 429s and use the server-suggested cooldown duration instead of a fixed pause. Capped at MAX_COOLDOWN_SECONDS for safety.
Improved cooldown logging: Rate-limit pause messages now print once per cooldown event (not once per waiting task) and show the actual pause duration, reducing log noise during heavy parallel runs.
Relaxed markdownify-rs version: Changed from pinned ==0.1.1 to >=0.1.1.

0.0.120 · 2026-02-20

New SqliteManager prefab tool: Schema-first SQLite tool for agents — supports list_tables, describe_table, and query commands with progressive disclosure. Build a DB from a list-of-dicts via SqliteManager.from_dicts() (auto-infers column types including JSON columns) or point at an existing .db file. Supports read-only mode, parameterized queries, multiple output formats (JSON, YAML, CSV, TSV), row-count truncation, and sample rows. See Tool Use for usage.
Retired claude-3.5-haiku: Removed claude-3.5-haiku (claude-3-5-haiku-20241022) from the model registry — the model has been retired by Anthropic. Use claude-4.5-haiku instead. Remaining references to 3.5-haiku in examples and tests have been updated to 4.5-haiku.

0.0.119 · 2026-02-19

Embedding rate limiting: embed_parallel_async now accepts max_requests_per_minute and max_tokens_per_minute parameters, using the same StatusTracker capacity system as LLMClient. Rate-limit (429) responses trigger automatic cooldown. Previously these were accepted as **kwargs and leaked into the API payload, causing request failures.
Deterministic test suite: Added tests/core/test_embed.py with 15 tests covering request building, response parsing, payload isolation (control kwargs never leak to the API), rate-limit param acceptance, and edge cases — no live API calls required.

0.0.118 · 2026-02-19

Embeddings rewrite: Completely rewrote lm_deluge.embed — the old implementation was broken (crashed immediately due to a serialization bug). The new module uses asyncio.gather + Semaphore for clean parallel batching with per-request sessions, exponential backoff retries, and live cost/token tracking in the progress bar.
Cohere v2 API: Switched Cohere embeddings from the deprecated v1 to the v2 endpoint (/v2/embed).
New model: embed-v4.0: Added Cohere’s latest embedding model with configurable output dimensions (256, 512, 1024, 1536) and 128k context window.
Cost tracking: Embeddings now track tokens and cost live in the tqdm progress bar and print a summary on completion. Each EmbeddingResponse includes a tokens_used field. Updated pricing for all models.
embed_sync helper: New synchronous convenience wrapper that returns a flat list of embedding vectors.
New docs page: Added Embeddings documentation with model table, examples, and configuration reference.

0.0.116 · 2026-02-17

Claude Sonnet 4.6 support: Added claude-4.6-sonnet (claude-sonnet-4-6) with $3/$15 pricing, GA structured outputs, image input, and reasoning support. Also added Bedrock entries for both 4.6 models (claude-4.6-opus-bedrock, claude-4.6-sonnet-bedrock).
Adaptive thinking default for all 4.6 models: Both Opus 4.6 and Sonnet 4.6 now default to thinking: {type: "adaptive"} when no explicit thinking_budget is set. Explicit budget_tokens still works but emits a deprecation warning.
GA effort parameter for Sonnet 4.6: global_effort and reasoning_effort now map to output_config.effort for Sonnet 4.6 (previously only Opus 4.5/4.6). The -low, -medium, -high model name suffixes work as expected (e.g. claude-4.6-sonnet-medium).
Prefill blocking for Sonnet 4.6: Assistant message prefill is now rejected for all 4.6 models (was previously Opus-only), matching the upstream API behavior.
Model aliases: Models can now define an aliases list so common alternative names resolve to the same model. For example, claude-sonnet-4-6 (the API name) now resolves to claude-4.6-sonnet, and claude-haiku-4.5 resolves to claude-4.5-haiku. Aliases work with reasoning suffixes too (e.g. claude-sonnet-4-6-high). All Anthropic models with differing API names have aliases configured.
Duplicate model registration warning: register_model() now prints a warning when a model id or alias collides with an existing registry entry, catching configuration bugs at import time.

0.0.115 · 2026-02-12

File and image URL passthrough: File and Image objects now accept HTTP(S) URLs as data and pass them directly to providers that support it (OpenAI Responses API, Anthropic, Google Gemini), avoiding unnecessary download and base64-encoding. Providers that don’t support URL passthrough (OpenAI Chat Completions for files, Mistral, Nova) automatically fall back to downloading and base64-encoding.
Fixed OpenAI Responses input_file deserialization: Conversation.from_openai_chat() now correctly parses input_file blocks where file_url, file_data, and filename are at the top level of the block (as emitted by the Responses API), not nested inside a sub-dict.

0.0.114 · 2026-02-12

Dropped tiktoken dependency: Removed tiktoken from project dependencies (pyproject.toml and requirements.txt) and switched Conversation.count_tokens() to a lightweight heuristic (len(text) // 4) for text token estimation.
Fixed OpenAI Responses usage parsing: Usage.from_openai_usage() now supports both OpenAI shapes: Chat Completions (prompt_tokens / completion_tokens, prompt_tokens_details) and Responses API (input_tokens / output_tokens, input_tokens_details), including cache-read token extraction in both cases.
Added coverage for usage/cost correctness:
- Extended tests/core/test_incomplete_response.py to assert usage mapping for Responses payloads and added a direct compatibility test for both OpenAI usage formats.
- Added a new live test tests/models/test_gpt_5_2_responses_cost_live.py that uses dotenv, calls gpt-5.2 with use_responses_api=True, and verifies non-zero usage/cost plus exact usage-to-cost reconciliation against model pricing.
Test cleanup: Minor formatting/type-hint cleanup in tests/models/test_xhigh_reasoning.py to keep lint/type checks clean.

0.0.113 · 2026-02-11

Anthropic tool schemas strip unsupported constraints: Tool schemas sent to Anthropic now automatically remove numeric constraints (minimum, maximum, etc.) that the API rejects, folding them into the property’s description so the model still sees the intent. This applies to Tool.for_anthropic() (both strict and non-strict modes) and to raw dict tool definitions passed through tools=, including nested custom tool schemas. Caller-provided dicts are never mutated.

0.0.112 · 2026-02-09

JSON healing: escape unescaped interior quotes: load_json now detects and escapes double quotes inside JSON string values that weren’t properly escaped (e.g. "the agent ("attorney-in-fact") is authorized"). Uses structural context — a quote only counts as a real string terminator if the next non-whitespace character is a JSON delimiter (,, }, ], :). Tried as a later fallback after bracket/comma healing to avoid false positives on valid JSON.

0.0.111 · 2026-02-08

Immediate error on impossible token budget: Requests whose estimated token count (prompt + max_new_tokens) exceeds max_tokens_per_minute now raise a ValueError immediately instead of hanging forever waiting for capacity that can never be granted.

0.0.110 · 2026-02-07

Added full Claude Opus 4.6 request support in the Anthropic builder: adaptive thinking (thinking: {type: "adaptive"}), 128k-style large-output compatibility plumbing, inference_geo passthrough, and Opus 4.6 assistant-prefill rejection behavior.
Migrated Opus effort handling to GA shape for both claude-4.5-opus and claude-4.6-opus via output_config.effort (including support for global_effort="max"), removing reliance on the old effort beta header path.
Added compatibility passthrough for deprecated Anthropic output_format by mapping it to output_config.format, while preserving native output_config.format structured outputs.
Expanded proxy/server Anthropic compatibility models + adapters to parse/forward output_config, deprecated output_format, and inference_geo.
Added/updated regression coverage in tests/models/test_anthropic_thinking_budget.py, tests/core/test_server_adapters.py, and tests/core/test_new_llmclient_api.py.
Added live network validation in tests/one_off/test_anthropic_opus_46_features_live.py (loads .env via dotenv.load_dotenv()), covering Opus 4.5/4.6 effort, adaptive thinking, inference_geo, deprecated output_format, and prefill rejection.

0.0.108 · 2026-02-04

OpenAI Responses API now supports images in tool results: Tool results containing [Text(...), Image(...)] lists are now properly serialized as arrays with input_text and input_image types, allowing models to see images returned by tools natively.
Fixed OpenAI Chat Completions image extraction: Tool results with images now correctly append a user message containing the extracted images (was creating the message but not adding it to the request).
Image detail field: Image.oa_resp() now includes the detail field for controlling image processing fidelity.

0.0.107 · 2026-02-04

Anthropic structured outputs are GA: Structured outputs now use output_config.format (no beta header), and Claude 4.5 models are marked JSON-capable.
FilesystemManager zip URLs: FilesystemManager.from_zip() now accepts http(s) URLs to preload a workspace from a remote zip file.
Curl tool jq piping: get_curl_tool() now allows piping output to jq for JSON filtering while continuing to block other shell pipes.

0.0.105 · 2026-02-02

New get_curl_tool() prefab: A lightweight curl tool for making HTTP requests without needing a full sandbox. Validates commands to prevent shell injection, whitelists safe flags, and blocks requests to localhost/private IPs. Pair it with FilesystemManager for agents that need to fetch data and read/write files. See Tool Use for usage.
Verbose mode for agent loops: Pass verbose=True to run_agent_loop(), run_agent_loop_sync(), start_agent_loop_nowait(), or the batch variants to print each tool call and result as the agent runs. Long arguments and outputs are automatically truncated for readability.

0.0.104 · 2026-02-02

Lazy imports for prefab tools: The lm_deluge.tool.prefab module now uses lazy imports, so importing one tool (e.g., ModalSandbox) no longer requires dependencies for unrelated tools (e.g., lenlp/tantivy for FullTextSearchManager). Each tool’s dependencies are only loaded when that specific tool is used.

0.0.103 · 2026-01-31

Model fallbacks, blocklisting, and stickiness: Multi-model clients now support intelligent fallback behavior. Configure prefer_model="claude-4-sonnet" to always try your preferred model first with automatic failover, or use prefer_model="last" for multi-turn conversations to stick to whichever model was used previously (survives serialization via conv.model_used). Models that fail with unrecoverable errors (401, 403, 404) get automatically blocklisted for the client’s lifetime, while rate limits and server errors trigger retries. Agent loops automatically maintain stickiness across tool-calling rounds.
Added new models: kimi-k2.5 (Moonshot), glm-4.7-flash-openrouter, trinity-large-openrouter, and kimi-k2.5-openrouter via OpenRouter.
Added pricing metadata to existing Kimi models (kimi-k2, kimi-k2-turbo, kimi-k2-thinking, kimi-k2-thinking-turbo).
New documentation page: Model Fallbacks & Stickiness covering the three key patterns (primary + fallback, load balancing, multi-turn stickiness).

0.0.102 · 2026-01-29

Added ZAI (Zhipu AI) models: glm-4.7, glm-4.7-flash, glm-4.6, glm-4.5, and glm-4.5-air via the ZAI API with Anthropic-compatible spec.
Replaced fastmcp and mcp dependencies with a minimal built-in MCP client implementation (lm_deluge.mcp), reducing the dependency footprint while supporting HTTP and stdio transports for tool listing and calling.
Cleaned up JSON parsing: removed debug print statements from try_load_json().
Moved examples into proper documentation pages under /docs/examples/ covering batch processing, chat loops, computer use, and streaming.

0.0.101 · 2026-01-14

Added Claude Code skill (lm_deluge.skill.SKILL.md) with embedded usage documentation for Claude Code IDE integrations.
Improved aiohttp client connector error messages with clearer diagnostics when connection fails.
Updated Slack notification formatting for better readability.

0.0.100 · 2026-01-10

Fixed OpenAI Responses API handling of reasoning models with tools: reasoning content (summary blocks) now correctly serializes as reasoning type items with proper summary field structure, fixing issues where tool calls would fail on models like o4-mini.
Added test coverage for Responses API reasoning + tool interactions in tests/core/test_openai_responses_reasoning_tools.py.

0.0.99 · 2026-01-10

Bugfixes for OpenAI responses API + client-side tools.

0.0.98 · 2026-01-10

OpenAI Responses API now supports client-side tool execution: when you pass Tool objects (or local MCP servers with force_local_mcp=True) to start, start_nowait, process_prompts_async, etc. the client automatically runs an internal tool loop—calling your tools, collecting results, and continuing until the model finishes. Although this basically means we’re running an agent loop (which there are already methods for), this brings it into parity with how the Responses API works when you don’t provide tools that run client-side (you get back a completed response, with all the tool calls and reasoning that led to it). We decided that should match whether or not you have client-side tools, so JUST for the Responses API, we auto-run tools.
Responses API response parsing now preserves raw item payloads (raw_item) in ToolCall.extra_body for function calls, MCP calls, web search, and other built-in tools, making it easier to reconstruct the exact request format when needed.
Thinking parts from Responses API now include summary and raw_payload fields for richer introspection.
Agent loops (run_agent_loop) now raise NotImplementedError when use_responses_api=True to prevent confusion—use start() instead, which handles the tool loop automatically.
Added test coverage for Responses API tool call handling in tests/core/test_openai_responses_tool_calls.py.

0.0.97 · 2026-01-10

Fixed GPT-5 reasoning effort defaults: GPT-5 models no longer special-case to minimal effort when none is specified; they now follow the standard low default like other reasoning models.
Enabled JSON mode support (supports_json: True) for GPT-5 Codex variants (gpt-5.1-codex, gpt-5.1-codex-mini, gpt-5-codex) and gpt-5-chat-latest.
Updated Cerebras model catalog: added glm-4.7-cerebras (ZAI GLM 4.7); temporarily disabled preview models (llama-4-scout, llama-4-maverick, qwen-3-235b-thinking, qwen-3-coder) pending availability.

0.0.96 · 2026-01-07

Agent loops now accept an on_round_complete callback on run_agent_loop()/run_agent_loop_sync()/start_agent_loop_nowait() (and batch agent loop helpers) for per-round hooks.
New tool execution helpers: execute_tool_calls() (plus Tool.find()) for running ToolCalls locally and collecting (tool_call_id, result) tuples.
Conversation ergonomics: Conversation.with_tool_results() for adding tool outputs in bulk, and with_tool_result()/Message.with_tool_result() now accept dict results.
Added core test coverage for agent loop callbacks and the new tool helper utilities.

0.0.95 · 2026-01-06

Added PhilipsHueManager prefab (lm_deluge.tool.prefab) for controlling Philips Hue lights via the local bridge API (list lights, on/off, color, brightness; HUE_BRIDGE_IP + HUE_API_KEY).
Added an experimental lm_deluge.pipelines.heartbeat starter for running a model on a schedule.
Added a one-off live test for PhilipsHueManager (tests/one_off/test_philips_hue_live.py).

0.0.94 · 2026-01-01

Added get_response_files() to lm_deluge.util.anthropic_files to download Anthropic response files in-memory (optionally resolving real filenames via metadata).
Anthropic requests now populate APIResponse.finish_reason from stop_reason.
Message.user(..., file=...) now accepts a File object directly.
Added a one-off regression test for Anthropic finish_reason parsing (tests/one_off/test_anthropic_finish_reason.py).

0.0.93 · 2026-01-01

Added Anthropic Skills support: pass skills=[Skill(...)] to start(), run_agent_loop(), or batch methods to use Anthropic’s built-in skills (xlsx, pptx) or custom uploaded skills.
New Skill class (lm_deluge.Skill) for defining skills with type (anthropic/custom), skill_id, and version.
File download utilities in lm_deluge.util.anthropic_files: download_anthropic_file(), save_response_files(), get_anthropic_file_metadata() for retrieving files generated by skills.
ToolResult now includes a files field for code execution outputs, and ContainerFile TypedDict for file metadata.
Container ID reuse: container_id parameter on start()/run_agent_loop() and automatic reuse within agent loops to maintain state across turns.
Skills documentation page added to the docs site.

0.0.92 · 2026-01-01

Added Amazon Nova support on Bedrock (new request handler, model registry entries, prompt/tool/image serialization, and cache-point handling).
Expanded Azure catalog with OpenAI-compatible model definitions, dotenv-aware AZURE_URL lookup, and Responses API enabled only for OpenAI family models.
Added Tavily + Brave web search managers plus a configurable WebSearchManager to mix search/fetch backends.
Tavily extract now guarantees markdown output by converting HTML responses with markdownify when needed.
New one-off test suites for Nova Bedrock, Azure models, and Tavily/Brave/WebSearchManager coverage.

0.0.91 · 2025-12-28

CLI overhaul: installable deluge and deluge-server entrypoints with list, run, and agent subcommands, model filtering, JSON output, stdin/file inputs, image prompts, and MCP/prefab-enabled agent loops.
Model registry now tracks provider + supports_images metadata and exposes find_models() for filtering/sorting by capabilities and cost.
OpenRouter catalog expanded with NVIDIA Nemotron 3 Nano 30B (free/paid), Nemotron Nano 12B v2 VL (free/paid vision), Mistral Devstral 2 (free/paid), Xiaomi Mimo V2 Flash (free), AllenAI OLMo 3.1 32B Think (free), and a Trinity Mini free SKU. Removed retired Anthropic models.
Prompt refactor: Conversation.system/user are now instance methods (use Conversation().system(...).user(...)), added Conversation.ai, and prompt primitives (File, Image, etc.) live under lm_deluge.prompt with top-level re-exports; RequestContext moved to lm_deluge.api_requests.context.
MCP tooling adds MCPServer.from_mcp_config for Claude Desktop config parsing, and MCPServer is now exported at the top level.
Dependencies trimmed: removed numpy/pandas; embedding stack_results() now returns Python lists only; logprob utilities use math.
Config cleanup: dropped SamplingParams.to_vllm and ComputerUseParams.
Docs and repo hygiene: added proxy server docs + nav entry, refreshed README/examples for the new Conversation builder, added lint helper scripts (banned strings/weird spaces/max lines).

0.0.90 · 2025-12-26

Proxy server adds configurable model policy (allowlists, defaults, alias routes) with CLI/config support, optional request/provider logging, forwarded anthropic-beta headers, and richer Anthropic request support (thinking config, expanded content blocks).
Added thought signature preservation for Gemini 3 and Anthropic responses (including redacted thinking), with updated adapters and tests.
Sandbox prefabs reorganized into a package and expanded with a macOS-only SeatbeltSandbox and coverage for the new sandbox flows.
Added tinker:// OpenAI-compatible model auto-registration with multipart message flattening.
Message/Conversation to_log and from_log can optionally preserve image/file bytes (base64) for round-trip serialization.
OpenRouter catalog expands with minimax-m2.1 plus free gpt-oss-20b/gpt-oss-120b entries.
Provider compatibility fixes: Anthropic batch submission now posts JSON payloads (no temp JSONL), and Gemini tool schemas strip additionalProperties.

0.0.89 · 2025-12-17

Added gemini-3-flash-preview model with v1alpha API endpoint and pricing ($0.50/$3.0 per million tokens input/output).
Gemini 3 Flash supports minimal and medium thinking levels directly, unlike Gemini 3 Pro which only supports low and high. The request builder now detects Flash vs Pro and passes the appropriate thinkingLevel values.
Added test coverage for Flash-specific thinking levels in tests/models/test_gemini_3_thinking_level.py.

0.0.88 · 2025-12-15

New Recursive Language Model (RLM) manager/pipeline brings a tool-driven REPL for very long contexts, with persistent state, guarded imports, lm() fan-out, and final()/final_var() completion; covered by new core and long-context suites (including a 1.5M-char Ulysses run).
Added a Tantivy-powered FullTextSearch prefab (search/fetch tools) with query sanitization, optional dedupe, cached fetches, and a BrowseComp-Plus benchmark harness to stress it against ~100k docs.
Expanded sandbox prefabs: Modal, Daytona, Docker, and Fargate sandboxes now expose bash/file/process/tunnel helpers with async context managers and background process tracking, plus one-off tests for Docker/Daytona cleanup paths.
Packaging: introduced optional extras full_text_search (tantivy+lenlp) and sandbox (modal, daytona-sdk, docker) so heavyweight deps are opt-in; removed the unused deduplicate_strategy parameter from the FTS API.

0.0.87 · 2025-12-11

Added xhigh reasoning effort support for GPT-5.2 and GPT-5.1-Codex-Max, the two models that support OpenAI’s new extra-high reasoning tier. Other reasoning models automatically fall back to high with a warning.
Model name suffixes now support -xhigh (e.g., gpt-5.2-xhigh) alongside the existing -low, -medium, -high, -minimal, and -none suffixes.
Fixed GPT-5.2 and GPT-5.1-Codex-Max requests to omit temperature and top_p when reasoning is enabled, matching OpenAI’s new API constraints for these models.
Added supports_xhigh flag to APIModel for models that support the xhigh reasoning tier.
Added comprehensive test coverage in tests/models/test_xhigh_reasoning.py.

0.0.86 · 2025-12-04

Fixed critical bug in agent loop where conversation.with_tool_result() wasn’t being reassigned, causing tool results to be silently dropped from the conversation history.
OpenAI web search tool now defaults to GA mode (preview=False) instead of preview.

0.0.85 · 2025-12-04

Added max_content_chars parameter to ExaWebSearchManager for controlling response size.
Enhanced OpenAI built-in web search tool with better configuration options.
Added comprehensive test coverage for OpenAI web search in tests/core/test_openai_web_search.py.

0.0.84 · 2025-12-04

Added TryCua integration for computer use agents, with full executor implementation supporting screenshots, clicks, typing, scrolling, and multi-step tasks.
Added Anthropic built-in web search tool support with test coverage.
Added Gemini computer use via Kernel executor with dedicated test suite.
Added batch agent loops capability for running multiple agent conversations in parallel.
Registered new Gemini models including gemini-2.5-pro and gemini-2.5-flash.

0.0.83 · 2025-12-02

More prefab tools: added Google Docs tools (metadata, ranged reads/grep, markdown-aware insert/update/replace/delete),Google Sheets tools (list tabs, find used ranges, read ranges as HTML tables, update cells) with service-account auth. Added Exa Web Search tool (more web search tools coming), AWS SES Email tool, S3-backed filesystem and memory.
Model catalog: registered Arcee trinity-mini (native, OpenRouter, Together), refreshed DeepSeek pricing plus new reasoner/speciale variants (including an Anthropic-compatible path), and marked Kimi thinking SKUs as reasoning models with a warning when thinking is disabled.
Client knobs & coverage: LLMClient now accepts global_effort and thinking_budget at construction so Anthropic-style requests carry the right effort settings, and new suites cover the prefab tools, Arcee tool-calling, DeepSeek Speciale, and S3 integrations.

0.0.82 · 2025-11-30

Added _LLMClient.print_usage() and refactored StatusTracker.log_usage() so you can dump cumulative token/cost/time stats mid-run; final status output now reuses the same usage reporter.
Drafted a GEPA pipeline implementation plan (src/lm_deluge/pipelines/gepa/GEPA_IMPLEMENTATION_PLAN.md) outlining how to port the GEPA optimizer onto lm-deluge.

0.0.81 · 2025-11-26

Tooling overhaul: Tool.from_function now uses Pydantic TypeAdapter for schemas, supports Annotated[...] descriptions, extracts return-type output_schema (with optional runtime validation), and auto-converts TypedDict/Pydantic params. Serialization still honors strict/non-strict modes automatically.
New prefab helpers: ToolComposer (OTC) for code-based tool orchestration, BatchTool for bundling calls, ToolSearchTool for regex discovery + invocation, and MemoryManager for long-lived notes. Todos/subagents/filesystem managers stay available under lm_deluge.tool.prefab.
Pipelines split: extract, translate, and score_llm now live in lm_deluge.pipelines.
Modal sandbox bash drops timeouts for background commands and exposes bash/list_processes/get_url (network optional); docs updated accordingly.
Agent ergonomics: Conversation.print() pretty-prints conversations with truncation, and Open Tool Composition prompts now render available tool signatures correctly.
Robustness: Anthropic requests now map global_effort to output_config.effort, and aiohttp ServerDisconnectedError surfaces a structured APIResponse instead of an exception.

0.0.80 · 2025-11-24

Added global_effort to SamplingParams and Anthropic request wiring so claude-4.5-opus sends the new effort field plus beta header automatically.
Exposed thinking_budget on SamplingParams and made it take precedence over reasoning_effort for Anthropic and Gemini reasoning models (with warnings to flag overlaps); Gemini flash-lite enforces its minimum budget.
Fixed Gemini 3 request construction to always send generationConfig.thinkingConfig and remapped reasoning_effort="medium"/None to the provider-supported thinking levels.
Default temperature raised to 1.0 across docs and config defaults to match current provider behavior.
Added regression suites for Anthropic thinking budgets and Gemini reasoning/effort mapping (tests/models/test_anthropic_thinking_budget.py, tests/models/test_gemini_thinking_config.py, tests/models/test_gemini_3_thinking_level.py).

0.0.79 · 2025-11-22

Gemini 3 requests now send thinkingLevel="low" when callers specify reasoning_effort="none" or "minimal", avoiding unexpected high-effort reasoning (and cost) when users explicitly ask for lightweight runs.
Documented the new sandbox utilities (ModalSandbox and DaytonaSandbox) so agents can execute commands in managed remote environments with optional network blocking, stdout capture, file I/O, and preview tunnels.

0.0.78 · 2025-11-19

Fixed FilesystemManager.read_file so empty files no longer throw range errors when agents omit end_line; blank files now return an empty snippet and accurate metadata instead of failing mid-run.
Added regression coverage in tests/test_filesystem.py::test_filesystem_manager_reads_empty_files to lock the behavior down.

0.0.77 · 2025-11-19

Added FilesystemManager, an in-memory virtual workspace + tool wrapper that gives agents sandboxed read_file / write_file / list_dir / grep / apply_patch capabilities without touching the host filesystem; the implementation lives in lm_deluge.tool.prefab.filesystem.
Landed regression coverage in tests/test_filesystem.py plus a scripted live scenario in tests/test_filesystem_live.py so refactors keep the tool contract intact.
Documented the new manager throughout the README, feature guide, and API reference so it is easy to wire into existing agent loops (including tips on seeding backends, exporting workspaces, and disabling commands per session).

0.0.76 · 2025-11-18

Introduced SubAgentManager, a trio of tools (start_subagent, check_subagent, wait_for_subagent) that lets a primary agent delegate work to cheaper models; real-world coverage lives in tests/core/test_subagent_manager.py and the new Agent guide sections spell out the workflow.
Shipped TodoManager/TodoItem/TodoStatus/TodoPriority, giving LLMs a first-class todo scratchpad they can mutate via todowrite/todoread; the integration suite in tests/core/test_todo_manager.py ensures models follow the protocol.
_LLMClient now exposes start_agent_loop_nowait() + wait_for_agent_loop() around a new AgentLoopResponse, so you can launch parallel loops and gather the (Conversation, APIResponse) later; tests/core/test_agent_loop.py adds scenarios for concurrent loops and the docs (features, agents guide, API reference) walk through the new APIs.

0.0.75 · 2025-11-16

output_schema now accepts raw JSON Schemas or Pydantic BaseModel subclasses. lm_deluge.util.schema.prepare_output_schema() handles the conversion to strict JSON Schema (adds additionalProperties: false, expands $defs, keeps optional fields nullable, etc.) and feeds both Anthropic and OpenAI builders, with coverage in tests/core/test_schema_transformations.py and tests/core/test_pydantic_structured_outputs.py.
Anthropic/OpenAI structured output requests now share the same normalization path so provider quirks stay isolated—unsupported Anthropic constraints move into descriptions while OpenAI keeps the tight grammar untouched. Regression suites for the chat and Responses APIs plus new real-run harnesses (tests/one_off/test_anthropic_structured_outputs_real.py, tests/one_off/test_openai_structured_outputs_real.py) make sure the wiring keeps working.
Shipped examples/pydantic_structured_outputs_example.py and refreshed the structured outputs docs so teams can drop a Pydantic model into LLMClient.process_prompts_*() without hand-rolling schemas or worrying about mutation.

0.0.74 · 2025-11-15

Structured outputs landed across Anthropic and OpenAI: LLMClient(..., output_schema=...) now pushes the JSON Schema to Claude (complete with the structured-outputs-2025-11-13 beta and strict-tool gating) and to both OpenAI chat and Responses API requests, with schema precedence over json_mode everywhere.
Tightened tool serialization so strict schemas only turn on when providers actually support it (Bedrock always forces non-strict) and made MCP-backed OpenAI Responses runs share the same strict/non-strict behavior; covered by fresh suites in tests/core/test_openai_structured_outputs.py and tests/core/test_bedrock_requests.py.
process_prompts_sync() forwards output_schema, and the new regression test (tests/core/test_process_prompts_sync.py) ensures future changes keep the sync/async surfaces aligned.
Added one-off real API coverage for OpenAI structured outputs plus a battery of deterministic unit tests so regressions in schema handling or strict tooling are caught automatically.

0.0.73 · 2025-11-13

Added the GPT-5.1 family (standard, Codex, Codex Mini) with pricing metadata and marked them as reasoning models so they Just Work with LLMClient.
Extended reasoning suffix parsing to accept -minimal and -none, enforced that Codex variants must run against the Responses API, and added guard rails that convert unsupported efforts to the closest valid value with clear warnings.
Updated the OpenAI request builders plus the warning system so GPT-5.1 downgrades from minimal to none transparently while older models downgrade to low, and added coverage for the new models (tests/models/test_gpt_5_1.py).

0.0.72 · 2025-11-11

Background requests now honour request_timeout precisely: polling uses a monotonic clock, cancels the remote response before erroring, and surfaces a structured timeout APIResponse instead of hanging jobs.
Cancellation is best-effort logged when failures happen so you can trace leaked jobs during debugging.

0.0.71 · 2025-11-10

Conversation.from_openai_chat() now filters out whitespace-only text blocks and skips empty messages so bad payloads from upstream providers no longer crash tool execution.
MockAsyncOpenAI does a real conversion from OpenAI tool definitions into lm-deluge Tool objects, wires them through LLMClient.start(), and carries the active CachePattern, so you can run copilot-style tools under tests without custom glue.
Added a focused test suite for the mock client (tests/test_mock_openai.py) that exercises the OpenAI-compatible surface area.

0.0.70 · 2025-11-09

Packaging now re-exports AsyncOpenAI-style exception classes (APIError, APITimeoutError, BadRequestError, RateLimitError) so verifier harnesses can catch them directly from lm_deluge.
MockAsyncOpenAI gained full parity with the official AsyncOpenAI signature: you can pass api_key, organization, project, custom base URLs, and call the legacy .completions.create() path in addition to chat completions.
Added an async close() noop for compatibility together with extensive tests to ensure verifier integrations behave as expected.

0.0.69 · 2025-11-09

Introduced the optional lm-deluge[openai] extra and shipped the first cut of MockAsyncOpenAI, giving you an on-device OpenAI-compatible client backed by LLMClient.
Registered the first Moonshot/Kimi (kimi-k2, kimi-k2-turbo, kimi-k2-thinking, kimi-k2-thinking-turbo) and MiniMax (minimax-m2) models so you can swap between those providers without custom API wrappers.
Added regression tests for the new models (tests/models/test_kimi_and_minimax.py) to make sure they stay callable.

0.0.67 · 2025-10-31

Hardened OpenAIResponsesRequest.handle_response() so truncated/incomplete streaming payloads now produce actionable error messages (with the provider’s incomplete_details) instead of JSON parsing failures, and ensured the dangling await in the OpenAI client path is fixed.
Added dedicated coverage in tests/core/test_incomplete_response.py for both the incomplete and the successful response paths.

0.0.66 · 2025-10-31

When you pass MCP server dictionaries (with a url key) through tools for Anthropic models, the client now automatically moves them into the mcp_servers array and sets the right beta header, so Anthropics’ MCP integration works without any manual request massaging.

0.0.65 · 2025-10-30

Tightened the strict-mode JSON Schema generator for tools: when strict=True, nested object schemas (including those inside $defs) have additionalProperties: false, defaults are stripped, and every property is marked required, matching OpenAI’s schema contract.
Backed the change with new tests in tests/core/test_tool_defs.py to ensure tools with and without $defs serialize correctly.

0.0.64 · 2025-10-30

Added first-class $defs/definitions support to Tool plus the MCP loader so complex tool schemas with references survive serialization.
Tool.for_openai_completions() now automatically includes $defs, rejects schemas that can’t run in strict mode, and sets additionalProperties: false so OpenAI’s strict JSON schema validation passes out of the box.

0.0.63 · 2025-10-30

SamplingParams and LLMClient accept reasoning_effort="minimal" (and "none") so you can target the more efficient reasoning tiers exposed by OpenAI without hand-editing objects.
Added regression coverage in tests/core/test_reasoning_effort_minimal.py.

0.0.62 · 2025-10-23

Message.with_file() / add_file() now accept existing File objects, letting you build up prompts from pre-signed files without duplicates.
Added Message.with_remote_file() to turn local bytes/paths into provider-hosted files asynchronously (with provider guard rails), making it easy to keep Anthropic/OpenAI file references in sync when constructing conversations.

Looking for something older? Run git log --oneline or inspect the GitHub release feed—this page will continue to backfill as new releases ship.