Changelog
LM Deluge iterates quickly. This page calls out the highlights from the most recent releases. For a blow-by-blow history you can always inspect git log, but the sections below should help you catch up quickly starting with v0.0.62.
0.0.99 · 2026-01-10
Section titled “0.0.99 · 2026-01-10”Bugfixes for OpenAI responses API + client-side tools.
0.0.98 · 2026-01-10
Section titled “0.0.98 · 2026-01-10”- OpenAI Responses API now supports client-side tool execution: when you pass
Toolobjects (or local MCP servers withforce_local_mcp=True) to start, start_nowait, process_prompts_async, etc. the client automatically runs an internal tool loop—calling your tools, collecting results, and continuing until the model finishes. Although this basically means we’re running an agent loop (which there are already methods for), this brings it into parity with how the Responses API works when you don’t provide tools that run client-side (you get back a completed response, with all the tool calls and reasoning that led to it). We decided that should match whether or not you have client-side tools, so JUST for the Responses API, we auto-run tools. - Responses API response parsing now preserves raw item payloads (
raw_item) inToolCall.extra_bodyfor function calls, MCP calls, web search, and other built-in tools, making it easier to reconstruct the exact request format when needed. Thinkingparts from Responses API now includesummaryandraw_payloadfields for richer introspection.- Agent loops (
run_agent_loop) now raiseNotImplementedErrorwhenuse_responses_api=Trueto prevent confusion—usestart()instead, which handles the tool loop automatically. - Added test coverage for Responses API tool call handling in
tests/core/test_openai_responses_tool_calls.py.
0.0.97 · 2026-01-10
Section titled “0.0.97 · 2026-01-10”- Fixed GPT-5 reasoning effort defaults: GPT-5 models no longer special-case to
minimaleffort when none is specified; they now follow the standardlowdefault like other reasoning models. - Enabled JSON mode support (
supports_json: True) for GPT-5 Codex variants (gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5-codex) andgpt-5-chat-latest. - Updated Cerebras model catalog: added
glm-4.7-cerebras(ZAI GLM 4.7); temporarily disabled preview models (llama-4-scout,llama-4-maverick,qwen-3-235b-thinking,qwen-3-coder) pending availability.
0.0.96 · 2026-01-07
Section titled “0.0.96 · 2026-01-07”- Agent loops now accept an
on_round_completecallback onrun_agent_loop()/run_agent_loop_sync()/start_agent_loop_nowait()(and batch agent loop helpers) for per-round hooks. - New tool execution helpers:
execute_tool_calls()(plusTool.find()) for runningToolCalls locally and collecting(tool_call_id, result)tuples. - Conversation ergonomics:
Conversation.with_tool_results()for adding tool outputs in bulk, andwith_tool_result()/Message.with_tool_result()now accept dict results. - Added core test coverage for agent loop callbacks and the new tool helper utilities.
0.0.95 · 2026-01-06
Section titled “0.0.95 · 2026-01-06”- Added
PhilipsHueManagerprefab (lm_deluge.tool.prefab) for controlling Philips Hue lights via the local bridge API (list lights, on/off, color, brightness;HUE_BRIDGE_IP+HUE_API_KEY). - Added an experimental
lm_deluge.pipelines.heartbeatstarter for running a model on a schedule. - Added a one-off live test for
PhilipsHueManager(tests/one_off/test_philips_hue_live.py).
0.0.94 · 2026-01-01
Section titled “0.0.94 · 2026-01-01”- Added
get_response_files()tolm_deluge.util.anthropic_filesto download Anthropic response files in-memory (optionally resolving real filenames via metadata). - Anthropic requests now populate
APIResponse.finish_reasonfromstop_reason. Message.user(..., file=...)now accepts aFileobject directly.- Added a one-off regression test for Anthropic
finish_reasonparsing (tests/one_off/test_anthropic_finish_reason.py).
0.0.93 · 2026-01-01
Section titled “0.0.93 · 2026-01-01”- Added Anthropic Skills support: pass
skills=[Skill(...)]tostart(),run_agent_loop(), or batch methods to use Anthropic’s built-in skills (xlsx, pptx) or custom uploaded skills. - New
Skillclass (lm_deluge.Skill) for defining skills withtype(anthropic/custom),skill_id, andversion. - File download utilities in
lm_deluge.util.anthropic_files:download_anthropic_file(),save_response_files(),get_anthropic_file_metadata()for retrieving files generated by skills. ToolResultnow includes afilesfield for code execution outputs, andContainerFileTypedDict for file metadata.- Container ID reuse:
container_idparameter onstart()/run_agent_loop()and automatic reuse within agent loops to maintain state across turns. - Skills documentation page added to the docs site.
0.0.92 · 2026-01-01
Section titled “0.0.92 · 2026-01-01”- Added Amazon Nova support on Bedrock (new request handler, model registry entries, prompt/tool/image serialization, and cache-point handling).
- Expanded Azure catalog with OpenAI-compatible model definitions, dotenv-aware
AZURE_URLlookup, and Responses API enabled only for OpenAI family models. - Added Tavily + Brave web search managers plus a configurable WebSearchManager to mix search/fetch backends.
- Tavily extract now guarantees markdown output by converting HTML responses with markdownify when needed.
- New one-off test suites for Nova Bedrock, Azure models, and Tavily/Brave/WebSearchManager coverage.
0.0.91 · 2025-12-28
Section titled “0.0.91 · 2025-12-28”- CLI overhaul: installable
delugeanddeluge-serverentrypoints withlist,run, andagentsubcommands, model filtering, JSON output, stdin/file inputs, image prompts, and MCP/prefab-enabled agent loops. - Model registry now tracks provider +
supports_imagesmetadata and exposesfind_models()for filtering/sorting by capabilities and cost. - OpenRouter catalog expanded with NVIDIA Nemotron 3 Nano 30B (free/paid), Nemotron Nano 12B v2 VL (free/paid vision), Mistral Devstral 2 (free/paid), Xiaomi Mimo V2 Flash (free), AllenAI OLMo 3.1 32B Think (free), and a Trinity Mini free SKU. Removed retired Anthropic models.
- Prompt refactor:
Conversation.system/userare now instance methods (useConversation().system(...).user(...)), addedConversation.ai, and prompt primitives (File,Image, etc.) live underlm_deluge.promptwith top-level re-exports;RequestContextmoved tolm_deluge.api_requests.context. - MCP tooling adds
MCPServer.from_mcp_configfor Claude Desktop config parsing, andMCPServeris now exported at the top level. - Dependencies trimmed: removed numpy/pandas; embedding
stack_results()now returns Python lists only; logprob utilities usemath. - Config cleanup: dropped
SamplingParams.to_vllmandComputerUseParams. - Docs and repo hygiene: added proxy server docs + nav entry, refreshed README/examples for the new Conversation builder, added lint helper scripts (banned strings/weird spaces/max lines).
0.0.90 · 2025-12-26
Section titled “0.0.90 · 2025-12-26”- Proxy server adds configurable model policy (allowlists, defaults, alias routes) with CLI/config support, optional request/provider logging, forwarded
anthropic-betaheaders, and richer Anthropic request support (thinking config, expanded content blocks). - Added thought signature preservation for Gemini 3 and Anthropic responses (including redacted thinking), with updated adapters and tests.
- Sandbox prefabs reorganized into a package and expanded with a macOS-only SeatbeltSandbox and coverage for the new sandbox flows.
- Added
tinker://OpenAI-compatible model auto-registration with multipart message flattening. - Message/Conversation
to_logandfrom_logcan optionally preserve image/file bytes (base64) for round-trip serialization. - OpenRouter catalog expands with
minimax-m2.1plus freegpt-oss-20b/gpt-oss-120bentries. - Provider compatibility fixes: Anthropic batch submission now posts JSON payloads (no temp JSONL), and Gemini tool schemas strip
additionalProperties.
0.0.89 · 2025-12-17
Section titled “0.0.89 · 2025-12-17”- Added
gemini-3-flash-previewmodel with v1alpha API endpoint and pricing ($0.50/$3.0 per million tokens input/output). - Gemini 3 Flash supports
minimalandmediumthinking levels directly, unlike Gemini 3 Pro which only supportslowandhigh. The request builder now detects Flash vs Pro and passes the appropriatethinkingLevelvalues. - Added test coverage for Flash-specific thinking levels in
tests/models/test_gemini_3_thinking_level.py.
0.0.88 · 2025-12-15
Section titled “0.0.88 · 2025-12-15”- New Recursive Language Model (RLM) manager/pipeline brings a tool-driven REPL for very long contexts, with persistent state, guarded imports,
lm()fan-out, andfinal()/final_var()completion; covered by new core and long-context suites (including a 1.5M-char Ulysses run). - Added a Tantivy-powered FullTextSearch prefab (
search/fetchtools) with query sanitization, optional dedupe, cached fetches, and a BrowseComp-Plus benchmark harness to stress it against ~100k docs. - Expanded sandbox prefabs: Modal, Daytona, Docker, and Fargate sandboxes now expose bash/file/process/tunnel helpers with async context managers and background process tracking, plus one-off tests for Docker/Daytona cleanup paths.
- Packaging: introduced optional extras
full_text_search(tantivy+lenlp) andsandbox(modal, daytona-sdk, docker) so heavyweight deps are opt-in; removed the unuseddeduplicate_strategyparameter from the FTS API.
0.0.87 · 2025-12-11
Section titled “0.0.87 · 2025-12-11”- Added
xhighreasoning effort support for GPT-5.2 and GPT-5.1-Codex-Max, the two models that support OpenAI’s new extra-high reasoning tier. Other reasoning models automatically fall back tohighwith a warning. - Model name suffixes now support
-xhigh(e.g.,gpt-5.2-xhigh) alongside the existing-low,-medium,-high,-minimal, and-nonesuffixes. - Fixed GPT-5.2 and GPT-5.1-Codex-Max requests to omit
temperatureandtop_pwhen reasoning is enabled, matching OpenAI’s new API constraints for these models. - Added
supports_xhighflag toAPIModelfor models that support the xhigh reasoning tier. - Added comprehensive test coverage in
tests/models/test_xhigh_reasoning.py.
0.0.86 · 2025-12-04
Section titled “0.0.86 · 2025-12-04”- Fixed critical bug in agent loop where
conversation.with_tool_result()wasn’t being reassigned, causing tool results to be silently dropped from the conversation history. - OpenAI web search tool now defaults to GA mode (
preview=False) instead of preview.
0.0.85 · 2025-12-04
Section titled “0.0.85 · 2025-12-04”- Added
max_content_charsparameter toExaWebSearchManagerfor controlling response size. - Enhanced OpenAI built-in web search tool with better configuration options.
- Added comprehensive test coverage for OpenAI web search in
tests/core/test_openai_web_search.py.
0.0.84 · 2025-12-04
Section titled “0.0.84 · 2025-12-04”- Added TryCua integration for computer use agents, with full executor implementation supporting screenshots, clicks, typing, scrolling, and multi-step tasks.
- Added Anthropic built-in web search tool support with test coverage.
- Added Gemini computer use via Kernel executor with dedicated test suite.
- Added batch agent loops capability for running multiple agent conversations in parallel.
- Registered new Gemini models including
gemini-2.5-proandgemini-2.5-flash.
0.0.83 · 2025-12-02
Section titled “0.0.83 · 2025-12-02”- More prefab tools: added Google Docs tools (metadata, ranged reads/grep, markdown-aware insert/update/replace/delete),Google Sheets tools (list tabs, find used ranges, read ranges as HTML tables, update cells) with service-account auth. Added Exa Web Search tool (more web search tools coming), AWS SES Email tool, S3-backed filesystem and memory.
- Model catalog: registered Arcee
trinity-mini(native, OpenRouter, Together), refreshed DeepSeek pricing plus new reasoner/speciale variants (including an Anthropic-compatible path), and marked Kimi thinking SKUs as reasoning models with a warning when thinking is disabled. - Client knobs & coverage:
LLMClientnow acceptsglobal_effortandthinking_budgetat construction so Anthropic-style requests carry the right effort settings, and new suites cover the prefab tools, Arcee tool-calling, DeepSeek Speciale, and S3 integrations.
0.0.82 · 2025-11-30
Section titled “0.0.82 · 2025-11-30”- Added
_LLMClient.print_usage()and refactoredStatusTracker.log_usage()so you can dump cumulative token/cost/time stats mid-run; final status output now reuses the same usage reporter. - Drafted a GEPA pipeline implementation plan (
src/lm_deluge/pipelines/gepa/GEPA_IMPLEMENTATION_PLAN.md) outlining how to port the GEPA optimizer onto lm-deluge.
0.0.81 · 2025-11-26
Section titled “0.0.81 · 2025-11-26”- Tooling overhaul:
Tool.from_functionnow uses PydanticTypeAdapterfor schemas, supportsAnnotated[...]descriptions, extracts return-typeoutput_schema(with optional runtime validation), and auto-converts TypedDict/Pydantic params. Serialization still honors strict/non-strict modes automatically. - New prefab helpers:
ToolComposer(OTC) for code-based tool orchestration,BatchToolfor bundling calls,ToolSearchToolfor regex discovery + invocation, andMemoryManagerfor long-lived notes. Todos/subagents/filesystem managers stay available underlm_deluge.tool.prefab. - Pipelines split:
extract,translate, andscore_llmnow live inlm_deluge.pipelines. - Modal sandbox
bashdrops timeouts for background commands and exposesbash/list_processes/get_url(network optional); docs updated accordingly. - Agent ergonomics:
Conversation.print()pretty-prints conversations with truncation, and Open Tool Composition prompts now render available tool signatures correctly. - Robustness: Anthropic requests now map
global_efforttooutput_config.effort, and aiohttpServerDisconnectedErrorsurfaces a structuredAPIResponseinstead of an exception.
0.0.80 · 2025-11-24
Section titled “0.0.80 · 2025-11-24”- Added
global_efforttoSamplingParamsand Anthropic request wiring soclaude-4.5-opussends the neweffortfield plus beta header automatically. - Exposed
thinking_budgetonSamplingParamsand made it take precedence overreasoning_effortfor Anthropic and Gemini reasoning models (with warnings to flag overlaps); Gemini flash-lite enforces its minimum budget. - Fixed Gemini 3 request construction to always send
generationConfig.thinkingConfigand remappedreasoning_effort="medium"/Noneto the provider-supported thinking levels. - Default temperature raised to
1.0across docs and config defaults to match current provider behavior. - Added regression suites for Anthropic thinking budgets and Gemini reasoning/effort mapping (
tests/models/test_anthropic_thinking_budget.py,tests/models/test_gemini_thinking_config.py,tests/models/test_gemini_3_thinking_level.py).
0.0.79 · 2025-11-22
Section titled “0.0.79 · 2025-11-22”- Gemini 3 requests now send
thinkingLevel="low"when callers specifyreasoning_effort="none"or"minimal", avoiding unexpected high-effort reasoning (and cost) when users explicitly ask for lightweight runs. - Documented the new sandbox utilities (
ModalSandboxandDaytonaSandbox) so agents can execute commands in managed remote environments with optional network blocking, stdout capture, file I/O, and preview tunnels.
0.0.78 · 2025-11-19
Section titled “0.0.78 · 2025-11-19”- Fixed
FilesystemManager.read_fileso empty files no longer throw range errors when agents omitend_line; blank files now return an empty snippet and accurate metadata instead of failing mid-run. - Added regression coverage in
tests/test_filesystem.py::test_filesystem_manager_reads_empty_filesto lock the behavior down.
0.0.77 · 2025-11-19
Section titled “0.0.77 · 2025-11-19”- Added
FilesystemManager, an in-memory virtual workspace + tool wrapper that gives agents sandboxedread_file/write_file/list_dir/grep/apply_patchcapabilities without touching the host filesystem; the implementation lives inlm_deluge.tool.prefab.filesystem. - Landed regression coverage in
tests/test_filesystem.pyplus a scripted live scenario intests/test_filesystem_live.pyso refactors keep the tool contract intact. - Documented the new manager throughout the README, feature guide, and API reference so it is easy to wire into existing agent loops (including tips on seeding backends, exporting workspaces, and disabling commands per session).
0.0.76 · 2025-11-18
Section titled “0.0.76 · 2025-11-18”- Introduced
SubAgentManager, a trio of tools (start_subagent,check_subagent,wait_for_subagent) that lets a primary agent delegate work to cheaper models; real-world coverage lives intests/core/test_subagent_manager.pyand the new Agent guide sections spell out the workflow. - Shipped
TodoManager/TodoItem/TodoStatus/TodoPriority, giving LLMs a first-class todo scratchpad they can mutate viatodowrite/todoread; the integration suite intests/core/test_todo_manager.pyensures models follow the protocol. _LLMClientnow exposesstart_agent_loop_nowait()+wait_for_agent_loop()around a newAgentLoopResponse, so you can launch parallel loops and gather the(Conversation, APIResponse)later;tests/core/test_agent_loop.pyadds scenarios for concurrent loops and the docs (features, agents guide, API reference) walk through the new APIs.
0.0.75 · 2025-11-16
Section titled “0.0.75 · 2025-11-16”output_schemanow accepts raw JSON Schemas or PydanticBaseModelsubclasses.lm_deluge.util.schema.prepare_output_schema()handles the conversion to strict JSON Schema (addsadditionalProperties: false, expands$defs, keeps optional fields nullable, etc.) and feeds both Anthropic and OpenAI builders, with coverage intests/core/test_schema_transformations.pyandtests/core/test_pydantic_structured_outputs.py.- Anthropic/OpenAI structured output requests now share the same normalization path so provider quirks stay isolated—unsupported Anthropic constraints move into descriptions while OpenAI keeps the tight grammar untouched. Regression suites for the chat and Responses APIs plus new real-run harnesses (
tests/one_off/test_anthropic_structured_outputs_real.py,tests/one_off/test_openai_structured_outputs_real.py) make sure the wiring keeps working. - Shipped
examples/pydantic_structured_outputs_example.pyand refreshed the structured outputs docs so teams can drop a Pydantic model intoLLMClient.process_prompts_*()without hand-rolling schemas or worrying about mutation.
0.0.74 · 2025-11-15
Section titled “0.0.74 · 2025-11-15”- Structured outputs landed across Anthropic and OpenAI:
LLMClient(..., output_schema=...)now pushes the JSON Schema to Claude (complete with thestructured-outputs-2025-11-13beta and strict-tool gating) and to both OpenAI chat and Responses API requests, with schema precedence overjson_modeeverywhere. - Tightened tool serialization so strict schemas only turn on when providers actually support it (Bedrock always forces non-strict) and made MCP-backed OpenAI Responses runs share the same strict/non-strict behavior; covered by fresh suites in
tests/core/test_openai_structured_outputs.pyandtests/core/test_bedrock_requests.py. process_prompts_sync()forwardsoutput_schema, and the new regression test (tests/core/test_process_prompts_sync.py) ensures future changes keep the sync/async surfaces aligned.- Added one-off real API coverage for OpenAI structured outputs plus a battery of deterministic unit tests so regressions in schema handling or strict tooling are caught automatically.
0.0.73 · 2025-11-13
Section titled “0.0.73 · 2025-11-13”- Added the GPT-5.1 family (standard, Codex, Codex Mini) with pricing metadata and marked them as reasoning models so they Just Work with
LLMClient. - Extended reasoning suffix parsing to accept
-minimaland-none, enforced that Codex variants must run against the Responses API, and added guard rails that convert unsupported efforts to the closest valid value with clear warnings. - Updated the OpenAI request builders plus the warning system so GPT-5.1 downgrades from
minimaltononetransparently while older models downgrade tolow, and added coverage for the new models (tests/models/test_gpt_5_1.py).
0.0.72 · 2025-11-11
Section titled “0.0.72 · 2025-11-11”- Background requests now honour
request_timeoutprecisely: polling uses a monotonic clock, cancels the remote response before erroring, and surfaces a structured timeoutAPIResponseinstead of hanging jobs. - Cancellation is best-effort logged when failures happen so you can trace leaked jobs during debugging.
0.0.71 · 2025-11-10
Section titled “0.0.71 · 2025-11-10”Conversation.from_openai_chat()now filters out whitespace-only text blocks and skips empty messages so bad payloads from upstream providers no longer crash tool execution.MockAsyncOpenAIdoes a real conversion from OpenAI tool definitions into lm-delugeToolobjects, wires them throughLLMClient.start(), and carries the activeCachePattern, so you can run copilot-style tools under tests without custom glue.- Added a focused test suite for the mock client (
tests/test_mock_openai.py) that exercises the OpenAI-compatible surface area.
0.0.70 · 2025-11-09
Section titled “0.0.70 · 2025-11-09”- Packaging now re-exports AsyncOpenAI-style exception classes (
APIError,APITimeoutError,BadRequestError,RateLimitError) so verifier harnesses can catch them directly fromlm_deluge. MockAsyncOpenAIgained full parity with the officialAsyncOpenAIsignature: you can passapi_key,organization,project, custom base URLs, and call the legacy.completions.create()path in addition to chat completions.- Added an async
close()noop for compatibility together with extensive tests to ensure verifier integrations behave as expected.
0.0.69 · 2025-11-09
Section titled “0.0.69 · 2025-11-09”- Introduced the optional
lm-deluge[openai]extra and shipped the first cut ofMockAsyncOpenAI, giving you an on-device OpenAI-compatible client backed byLLMClient. - Registered the first Moonshot/Kimi (
kimi-k2,kimi-k2-turbo,kimi-k2-thinking,kimi-k2-thinking-turbo) and MiniMax (minimax-m2) models so you can swap between those providers without custom API wrappers. - Added regression tests for the new models (
tests/models/test_kimi_and_minimax.py) to make sure they stay callable.
0.0.67 · 2025-10-31
Section titled “0.0.67 · 2025-10-31”- Hardened
OpenAIResponsesRequest.handle_response()so truncated/incomplete streaming payloads now produce actionable error messages (with the provider’sincomplete_details) instead of JSON parsing failures, and ensured the dangling await in the OpenAI client path is fixed. - Added dedicated coverage in
tests/core/test_incomplete_response.pyfor both the incomplete and the successful response paths.
0.0.66 · 2025-10-31
Section titled “0.0.66 · 2025-10-31”- When you pass MCP server dictionaries (with a
urlkey) throughtoolsfor Anthropic models, the client now automatically moves them into themcp_serversarray and sets the right beta header, so Anthropics’ MCP integration works without any manual request massaging.
0.0.65 · 2025-10-30
Section titled “0.0.65 · 2025-10-30”- Tightened the strict-mode JSON Schema generator for tools: when
strict=True, nested object schemas (including those inside$defs) haveadditionalProperties: false, defaults are stripped, and every property is markedrequired, matching OpenAI’s schema contract. - Backed the change with new tests in
tests/core/test_tool_defs.pyto ensure tools with and without$defsserialize correctly.
0.0.64 · 2025-10-30
Section titled “0.0.64 · 2025-10-30”- Added first-class
$defs/definitionssupport toToolplus the MCP loader so complex tool schemas with references survive serialization. Tool.for_openai_completions()now automatically includes$defs, rejects schemas that can’t run in strict mode, and setsadditionalProperties: falseso OpenAI’s strict JSON schema validation passes out of the box.
0.0.63 · 2025-10-30
Section titled “0.0.63 · 2025-10-30”SamplingParamsandLLMClientacceptreasoning_effort="minimal"(and"none") so you can target the more efficient reasoning tiers exposed by OpenAI without hand-editing objects.- Added regression coverage in
tests/core/test_reasoning_effort_minimal.py.
0.0.62 · 2025-10-23
Section titled “0.0.62 · 2025-10-23”Message.with_file()/add_file()now accept existingFileobjects, letting you build up prompts from pre-signed files without duplicates.- Added
Message.with_remote_file()to turn local bytes/paths into provider-hosted files asynchronously (with provider guard rails), making it easy to keep Anthropic/OpenAI file references in sync when constructing conversations.
Looking for something older? Run git log --oneline or inspect the GitHub release feed—this page will continue to backfill as new releases ship.