Skip to content

Changelog

LM Deluge iterates quickly. This page calls out the highlights from the most recent releases. For a blow-by-blow history you can always inspect git log, but the sections below should help you catch up quickly starting with v0.0.62.

Bugfixes for OpenAI responses API + client-side tools.

  • OpenAI Responses API now supports client-side tool execution: when you pass Tool objects (or local MCP servers with force_local_mcp=True) to start, start_nowait, process_prompts_async, etc. the client automatically runs an internal tool loop—calling your tools, collecting results, and continuing until the model finishes. Although this basically means we’re running an agent loop (which there are already methods for), this brings it into parity with how the Responses API works when you don’t provide tools that run client-side (you get back a completed response, with all the tool calls and reasoning that led to it). We decided that should match whether or not you have client-side tools, so JUST for the Responses API, we auto-run tools.
  • Responses API response parsing now preserves raw item payloads (raw_item) in ToolCall.extra_body for function calls, MCP calls, web search, and other built-in tools, making it easier to reconstruct the exact request format when needed.
  • Thinking parts from Responses API now include summary and raw_payload fields for richer introspection.
  • Agent loops (run_agent_loop) now raise NotImplementedError when use_responses_api=True to prevent confusion—use start() instead, which handles the tool loop automatically.
  • Added test coverage for Responses API tool call handling in tests/core/test_openai_responses_tool_calls.py.
  • Fixed GPT-5 reasoning effort defaults: GPT-5 models no longer special-case to minimal effort when none is specified; they now follow the standard low default like other reasoning models.
  • Enabled JSON mode support (supports_json: True) for GPT-5 Codex variants (gpt-5.1-codex, gpt-5.1-codex-mini, gpt-5-codex) and gpt-5-chat-latest.
  • Updated Cerebras model catalog: added glm-4.7-cerebras (ZAI GLM 4.7); temporarily disabled preview models (llama-4-scout, llama-4-maverick, qwen-3-235b-thinking, qwen-3-coder) pending availability.
  • Agent loops now accept an on_round_complete callback on run_agent_loop()/run_agent_loop_sync()/start_agent_loop_nowait() (and batch agent loop helpers) for per-round hooks.
  • New tool execution helpers: execute_tool_calls() (plus Tool.find()) for running ToolCalls locally and collecting (tool_call_id, result) tuples.
  • Conversation ergonomics: Conversation.with_tool_results() for adding tool outputs in bulk, and with_tool_result()/Message.with_tool_result() now accept dict results.
  • Added core test coverage for agent loop callbacks and the new tool helper utilities.
  • Added PhilipsHueManager prefab (lm_deluge.tool.prefab) for controlling Philips Hue lights via the local bridge API (list lights, on/off, color, brightness; HUE_BRIDGE_IP + HUE_API_KEY).
  • Added an experimental lm_deluge.pipelines.heartbeat starter for running a model on a schedule.
  • Added a one-off live test for PhilipsHueManager (tests/one_off/test_philips_hue_live.py).
  • Added get_response_files() to lm_deluge.util.anthropic_files to download Anthropic response files in-memory (optionally resolving real filenames via metadata).
  • Anthropic requests now populate APIResponse.finish_reason from stop_reason.
  • Message.user(..., file=...) now accepts a File object directly.
  • Added a one-off regression test for Anthropic finish_reason parsing (tests/one_off/test_anthropic_finish_reason.py).
  • Added Anthropic Skills support: pass skills=[Skill(...)] to start(), run_agent_loop(), or batch methods to use Anthropic’s built-in skills (xlsx, pptx) or custom uploaded skills.
  • New Skill class (lm_deluge.Skill) for defining skills with type (anthropic/custom), skill_id, and version.
  • File download utilities in lm_deluge.util.anthropic_files: download_anthropic_file(), save_response_files(), get_anthropic_file_metadata() for retrieving files generated by skills.
  • ToolResult now includes a files field for code execution outputs, and ContainerFile TypedDict for file metadata.
  • Container ID reuse: container_id parameter on start()/run_agent_loop() and automatic reuse within agent loops to maintain state across turns.
  • Skills documentation page added to the docs site.
  • Added Amazon Nova support on Bedrock (new request handler, model registry entries, prompt/tool/image serialization, and cache-point handling).
  • Expanded Azure catalog with OpenAI-compatible model definitions, dotenv-aware AZURE_URL lookup, and Responses API enabled only for OpenAI family models.
  • Added Tavily + Brave web search managers plus a configurable WebSearchManager to mix search/fetch backends.
  • Tavily extract now guarantees markdown output by converting HTML responses with markdownify when needed.
  • New one-off test suites for Nova Bedrock, Azure models, and Tavily/Brave/WebSearchManager coverage.
  • CLI overhaul: installable deluge and deluge-server entrypoints with list, run, and agent subcommands, model filtering, JSON output, stdin/file inputs, image prompts, and MCP/prefab-enabled agent loops.
  • Model registry now tracks provider + supports_images metadata and exposes find_models() for filtering/sorting by capabilities and cost.
  • OpenRouter catalog expanded with NVIDIA Nemotron 3 Nano 30B (free/paid), Nemotron Nano 12B v2 VL (free/paid vision), Mistral Devstral 2 (free/paid), Xiaomi Mimo V2 Flash (free), AllenAI OLMo 3.1 32B Think (free), and a Trinity Mini free SKU. Removed retired Anthropic models.
  • Prompt refactor: Conversation.system/user are now instance methods (use Conversation().system(...).user(...)), added Conversation.ai, and prompt primitives (File, Image, etc.) live under lm_deluge.prompt with top-level re-exports; RequestContext moved to lm_deluge.api_requests.context.
  • MCP tooling adds MCPServer.from_mcp_config for Claude Desktop config parsing, and MCPServer is now exported at the top level.
  • Dependencies trimmed: removed numpy/pandas; embedding stack_results() now returns Python lists only; logprob utilities use math.
  • Config cleanup: dropped SamplingParams.to_vllm and ComputerUseParams.
  • Docs and repo hygiene: added proxy server docs + nav entry, refreshed README/examples for the new Conversation builder, added lint helper scripts (banned strings/weird spaces/max lines).
  • Proxy server adds configurable model policy (allowlists, defaults, alias routes) with CLI/config support, optional request/provider logging, forwarded anthropic-beta headers, and richer Anthropic request support (thinking config, expanded content blocks).
  • Added thought signature preservation for Gemini 3 and Anthropic responses (including redacted thinking), with updated adapters and tests.
  • Sandbox prefabs reorganized into a package and expanded with a macOS-only SeatbeltSandbox and coverage for the new sandbox flows.
  • Added tinker:// OpenAI-compatible model auto-registration with multipart message flattening.
  • Message/Conversation to_log and from_log can optionally preserve image/file bytes (base64) for round-trip serialization.
  • OpenRouter catalog expands with minimax-m2.1 plus free gpt-oss-20b/gpt-oss-120b entries.
  • Provider compatibility fixes: Anthropic batch submission now posts JSON payloads (no temp JSONL), and Gemini tool schemas strip additionalProperties.
  • Added gemini-3-flash-preview model with v1alpha API endpoint and pricing ($0.50/$3.0 per million tokens input/output).
  • Gemini 3 Flash supports minimal and medium thinking levels directly, unlike Gemini 3 Pro which only supports low and high. The request builder now detects Flash vs Pro and passes the appropriate thinkingLevel values.
  • Added test coverage for Flash-specific thinking levels in tests/models/test_gemini_3_thinking_level.py.
  • New Recursive Language Model (RLM) manager/pipeline brings a tool-driven REPL for very long contexts, with persistent state, guarded imports, lm() fan-out, and final()/final_var() completion; covered by new core and long-context suites (including a 1.5M-char Ulysses run).
  • Added a Tantivy-powered FullTextSearch prefab (search/fetch tools) with query sanitization, optional dedupe, cached fetches, and a BrowseComp-Plus benchmark harness to stress it against ~100k docs.
  • Expanded sandbox prefabs: Modal, Daytona, Docker, and Fargate sandboxes now expose bash/file/process/tunnel helpers with async context managers and background process tracking, plus one-off tests for Docker/Daytona cleanup paths.
  • Packaging: introduced optional extras full_text_search (tantivy+lenlp) and sandbox (modal, daytona-sdk, docker) so heavyweight deps are opt-in; removed the unused deduplicate_strategy parameter from the FTS API.
  • Added xhigh reasoning effort support for GPT-5.2 and GPT-5.1-Codex-Max, the two models that support OpenAI’s new extra-high reasoning tier. Other reasoning models automatically fall back to high with a warning.
  • Model name suffixes now support -xhigh (e.g., gpt-5.2-xhigh) alongside the existing -low, -medium, -high, -minimal, and -none suffixes.
  • Fixed GPT-5.2 and GPT-5.1-Codex-Max requests to omit temperature and top_p when reasoning is enabled, matching OpenAI’s new API constraints for these models.
  • Added supports_xhigh flag to APIModel for models that support the xhigh reasoning tier.
  • Added comprehensive test coverage in tests/models/test_xhigh_reasoning.py.
  • Fixed critical bug in agent loop where conversation.with_tool_result() wasn’t being reassigned, causing tool results to be silently dropped from the conversation history.
  • OpenAI web search tool now defaults to GA mode (preview=False) instead of preview.
  • Added max_content_chars parameter to ExaWebSearchManager for controlling response size.
  • Enhanced OpenAI built-in web search tool with better configuration options.
  • Added comprehensive test coverage for OpenAI web search in tests/core/test_openai_web_search.py.
  • Added TryCua integration for computer use agents, with full executor implementation supporting screenshots, clicks, typing, scrolling, and multi-step tasks.
  • Added Anthropic built-in web search tool support with test coverage.
  • Added Gemini computer use via Kernel executor with dedicated test suite.
  • Added batch agent loops capability for running multiple agent conversations in parallel.
  • Registered new Gemini models including gemini-2.5-pro and gemini-2.5-flash.
  • More prefab tools: added Google Docs tools (metadata, ranged reads/grep, markdown-aware insert/update/replace/delete),Google Sheets tools (list tabs, find used ranges, read ranges as HTML tables, update cells) with service-account auth. Added Exa Web Search tool (more web search tools coming), AWS SES Email tool, S3-backed filesystem and memory.
  • Model catalog: registered Arcee trinity-mini (native, OpenRouter, Together), refreshed DeepSeek pricing plus new reasoner/speciale variants (including an Anthropic-compatible path), and marked Kimi thinking SKUs as reasoning models with a warning when thinking is disabled.
  • Client knobs & coverage: LLMClient now accepts global_effort and thinking_budget at construction so Anthropic-style requests carry the right effort settings, and new suites cover the prefab tools, Arcee tool-calling, DeepSeek Speciale, and S3 integrations.
  • Added _LLMClient.print_usage() and refactored StatusTracker.log_usage() so you can dump cumulative token/cost/time stats mid-run; final status output now reuses the same usage reporter.
  • Drafted a GEPA pipeline implementation plan (src/lm_deluge/pipelines/gepa/GEPA_IMPLEMENTATION_PLAN.md) outlining how to port the GEPA optimizer onto lm-deluge.
  • Tooling overhaul: Tool.from_function now uses Pydantic TypeAdapter for schemas, supports Annotated[...] descriptions, extracts return-type output_schema (with optional runtime validation), and auto-converts TypedDict/Pydantic params. Serialization still honors strict/non-strict modes automatically.
  • New prefab helpers: ToolComposer (OTC) for code-based tool orchestration, BatchTool for bundling calls, ToolSearchTool for regex discovery + invocation, and MemoryManager for long-lived notes. Todos/subagents/filesystem managers stay available under lm_deluge.tool.prefab.
  • Pipelines split: extract, translate, and score_llm now live in lm_deluge.pipelines.
  • Modal sandbox bash drops timeouts for background commands and exposes bash/list_processes/get_url (network optional); docs updated accordingly.
  • Agent ergonomics: Conversation.print() pretty-prints conversations with truncation, and Open Tool Composition prompts now render available tool signatures correctly.
  • Robustness: Anthropic requests now map global_effort to output_config.effort, and aiohttp ServerDisconnectedError surfaces a structured APIResponse instead of an exception.
  • Added global_effort to SamplingParams and Anthropic request wiring so claude-4.5-opus sends the new effort field plus beta header automatically.
  • Exposed thinking_budget on SamplingParams and made it take precedence over reasoning_effort for Anthropic and Gemini reasoning models (with warnings to flag overlaps); Gemini flash-lite enforces its minimum budget.
  • Fixed Gemini 3 request construction to always send generationConfig.thinkingConfig and remapped reasoning_effort="medium"/None to the provider-supported thinking levels.
  • Default temperature raised to 1.0 across docs and config defaults to match current provider behavior.
  • Added regression suites for Anthropic thinking budgets and Gemini reasoning/effort mapping (tests/models/test_anthropic_thinking_budget.py, tests/models/test_gemini_thinking_config.py, tests/models/test_gemini_3_thinking_level.py).
  • Gemini 3 requests now send thinkingLevel="low" when callers specify reasoning_effort="none" or "minimal", avoiding unexpected high-effort reasoning (and cost) when users explicitly ask for lightweight runs.
  • Documented the new sandbox utilities (ModalSandbox and DaytonaSandbox) so agents can execute commands in managed remote environments with optional network blocking, stdout capture, file I/O, and preview tunnels.
  • Fixed FilesystemManager.read_file so empty files no longer throw range errors when agents omit end_line; blank files now return an empty snippet and accurate metadata instead of failing mid-run.
  • Added regression coverage in tests/test_filesystem.py::test_filesystem_manager_reads_empty_files to lock the behavior down.
  • Added FilesystemManager, an in-memory virtual workspace + tool wrapper that gives agents sandboxed read_file / write_file / list_dir / grep / apply_patch capabilities without touching the host filesystem; the implementation lives in lm_deluge.tool.prefab.filesystem.
  • Landed regression coverage in tests/test_filesystem.py plus a scripted live scenario in tests/test_filesystem_live.py so refactors keep the tool contract intact.
  • Documented the new manager throughout the README, feature guide, and API reference so it is easy to wire into existing agent loops (including tips on seeding backends, exporting workspaces, and disabling commands per session).
  • Introduced SubAgentManager, a trio of tools (start_subagent, check_subagent, wait_for_subagent) that lets a primary agent delegate work to cheaper models; real-world coverage lives in tests/core/test_subagent_manager.py and the new Agent guide sections spell out the workflow.
  • Shipped TodoManager/TodoItem/TodoStatus/TodoPriority, giving LLMs a first-class todo scratchpad they can mutate via todowrite/todoread; the integration suite in tests/core/test_todo_manager.py ensures models follow the protocol.
  • _LLMClient now exposes start_agent_loop_nowait() + wait_for_agent_loop() around a new AgentLoopResponse, so you can launch parallel loops and gather the (Conversation, APIResponse) later; tests/core/test_agent_loop.py adds scenarios for concurrent loops and the docs (features, agents guide, API reference) walk through the new APIs.
  • output_schema now accepts raw JSON Schemas or Pydantic BaseModel subclasses. lm_deluge.util.schema.prepare_output_schema() handles the conversion to strict JSON Schema (adds additionalProperties: false, expands $defs, keeps optional fields nullable, etc.) and feeds both Anthropic and OpenAI builders, with coverage in tests/core/test_schema_transformations.py and tests/core/test_pydantic_structured_outputs.py.
  • Anthropic/OpenAI structured output requests now share the same normalization path so provider quirks stay isolated—unsupported Anthropic constraints move into descriptions while OpenAI keeps the tight grammar untouched. Regression suites for the chat and Responses APIs plus new real-run harnesses (tests/one_off/test_anthropic_structured_outputs_real.py, tests/one_off/test_openai_structured_outputs_real.py) make sure the wiring keeps working.
  • Shipped examples/pydantic_structured_outputs_example.py and refreshed the structured outputs docs so teams can drop a Pydantic model into LLMClient.process_prompts_*() without hand-rolling schemas or worrying about mutation.
  • Structured outputs landed across Anthropic and OpenAI: LLMClient(..., output_schema=...) now pushes the JSON Schema to Claude (complete with the structured-outputs-2025-11-13 beta and strict-tool gating) and to both OpenAI chat and Responses API requests, with schema precedence over json_mode everywhere.
  • Tightened tool serialization so strict schemas only turn on when providers actually support it (Bedrock always forces non-strict) and made MCP-backed OpenAI Responses runs share the same strict/non-strict behavior; covered by fresh suites in tests/core/test_openai_structured_outputs.py and tests/core/test_bedrock_requests.py.
  • process_prompts_sync() forwards output_schema, and the new regression test (tests/core/test_process_prompts_sync.py) ensures future changes keep the sync/async surfaces aligned.
  • Added one-off real API coverage for OpenAI structured outputs plus a battery of deterministic unit tests so regressions in schema handling or strict tooling are caught automatically.
  • Added the GPT-5.1 family (standard, Codex, Codex Mini) with pricing metadata and marked them as reasoning models so they Just Work with LLMClient.
  • Extended reasoning suffix parsing to accept -minimal and -none, enforced that Codex variants must run against the Responses API, and added guard rails that convert unsupported efforts to the closest valid value with clear warnings.
  • Updated the OpenAI request builders plus the warning system so GPT-5.1 downgrades from minimal to none transparently while older models downgrade to low, and added coverage for the new models (tests/models/test_gpt_5_1.py).
  • Background requests now honour request_timeout precisely: polling uses a monotonic clock, cancels the remote response before erroring, and surfaces a structured timeout APIResponse instead of hanging jobs.
  • Cancellation is best-effort logged when failures happen so you can trace leaked jobs during debugging.
  • Conversation.from_openai_chat() now filters out whitespace-only text blocks and skips empty messages so bad payloads from upstream providers no longer crash tool execution.
  • MockAsyncOpenAI does a real conversion from OpenAI tool definitions into lm-deluge Tool objects, wires them through LLMClient.start(), and carries the active CachePattern, so you can run copilot-style tools under tests without custom glue.
  • Added a focused test suite for the mock client (tests/test_mock_openai.py) that exercises the OpenAI-compatible surface area.
  • Packaging now re-exports AsyncOpenAI-style exception classes (APIError, APITimeoutError, BadRequestError, RateLimitError) so verifier harnesses can catch them directly from lm_deluge.
  • MockAsyncOpenAI gained full parity with the official AsyncOpenAI signature: you can pass api_key, organization, project, custom base URLs, and call the legacy .completions.create() path in addition to chat completions.
  • Added an async close() noop for compatibility together with extensive tests to ensure verifier integrations behave as expected.
  • Introduced the optional lm-deluge[openai] extra and shipped the first cut of MockAsyncOpenAI, giving you an on-device OpenAI-compatible client backed by LLMClient.
  • Registered the first Moonshot/Kimi (kimi-k2, kimi-k2-turbo, kimi-k2-thinking, kimi-k2-thinking-turbo) and MiniMax (minimax-m2) models so you can swap between those providers without custom API wrappers.
  • Added regression tests for the new models (tests/models/test_kimi_and_minimax.py) to make sure they stay callable.
  • Hardened OpenAIResponsesRequest.handle_response() so truncated/incomplete streaming payloads now produce actionable error messages (with the provider’s incomplete_details) instead of JSON parsing failures, and ensured the dangling await in the OpenAI client path is fixed.
  • Added dedicated coverage in tests/core/test_incomplete_response.py for both the incomplete and the successful response paths.
  • When you pass MCP server dictionaries (with a url key) through tools for Anthropic models, the client now automatically moves them into the mcp_servers array and sets the right beta header, so Anthropics’ MCP integration works without any manual request massaging.
  • Tightened the strict-mode JSON Schema generator for tools: when strict=True, nested object schemas (including those inside $defs) have additionalProperties: false, defaults are stripped, and every property is marked required, matching OpenAI’s schema contract.
  • Backed the change with new tests in tests/core/test_tool_defs.py to ensure tools with and without $defs serialize correctly.
  • Added first-class $defs/definitions support to Tool plus the MCP loader so complex tool schemas with references survive serialization.
  • Tool.for_openai_completions() now automatically includes $defs, rejects schemas that can’t run in strict mode, and sets additionalProperties: false so OpenAI’s strict JSON schema validation passes out of the box.
  • SamplingParams and LLMClient accept reasoning_effort="minimal" (and "none") so you can target the more efficient reasoning tiers exposed by OpenAI without hand-editing objects.
  • Added regression coverage in tests/core/test_reasoning_effort_minimal.py.
  • Message.with_file() / add_file() now accept existing File objects, letting you build up prompts from pre-signed files without duplicates.
  • Added Message.with_remote_file() to turn local bytes/paths into provider-hosted files asynchronously (with provider guard rails), making it easy to keep Anthropic/OpenAI file references in sync when constructing conversations.

Looking for something older? Run git log --oneline or inspect the GitHub release feed—this page will continue to backfill as new releases ship.