Changelog¶
All notable changes to looplet are documented here. The format is
based on Keep a Changelog, and
this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
- Unified workspace reference grammar. Three forms, one resolver,
applied uniformly to every string value the cartridge loader
processes:
max_steps: ${runtime.max_steps:-15} # per-invocation data compact_service: ${ref:compact_service} # resource registry state: ${py:my.app.state:MyState} # imported Python symbol${runtime.x}supports nested lookup (${runtime.a.b}) and defaults (${runtime.x:-15}). The legacy"@name"form continues to work as an alias for${ref:name}so existing cartridges load unchanged. Seedocs/cartridge.md#reference-grammarfor the full spec. AgentPreset.resourcesfield. the cartridge loader now populates this dict with every resource it built. Callers that need post-load access to live objects (benchmarks, evidence-bundle writers, SDK shims) read frompreset.resourcesby name — no more module-hunting to reach a resource the cartridge constructed. Empty for presets built directly in code.- Declarative
state:directive inconfig.yaml. Cartridges can now describe their state class via the same grammar instead of relying on thestate_factoryconstructor arg ofcartridge_to_preset:Priority:state: ${py:my.app.state:MyAgentState} # → MyAgentState(max_steps=...) state: ${ref:my_state} # → resource as-isstate:directive >state_factoryarg >DefaultState. Closes the last gap where a non-trivial workspace had to writesetup.pyjust to attach a custom state class.
Fixed¶
- YAML parser now skips full-line comments.
#lines used to raiseCartridgeSerializationError. Same fix applied to the runtime-substitution pre-pass so${runtime.x}inside a YAML comment doesn't fire the regex.
Removed (BREAKING)¶
looplet.flagsmodule deleted. All feature flags migrated toLoopConfigfields in 0.1.6 and the module had been deprecated since. TheFLAGSsingleton,_Flagsclass, andLOOPLET_*environment variables are gone. Use the equivalentLoopConfigfields directly:LoopConfig(concurrent_dispatch=True),LoopConfig(reactive_recovery=True), etc.looplet.scaffolding.StallDetectorclass deleted. Superseded byStagnationHookinlooplet.stagnation. The back-compat bridge methods onStepProgressTracker(consecutive_empty,is_diminishing,record_step,guidance_text) are also removed — use the nativeconsecutive_unproductive/is_stagnatingproperties.
Changed¶
- Documentation cleanup. Stale "v1 / v2 / legacy / cartridge" framing trimmed from public docstrings and comments now that the migration to the cartridge format is complete. No behavioural change. ROADMAP entries that have shipped are dropped.
- Coding and research presets now use
DefaultCompactService. The preset path gets the same production compaction policy users can import directly: prune old tool payloads, summarize older context, keep recent steps verbatim, and report stage-level outcomes. - Docs and shipped examples now teach
DefaultCompactServicefirst. README, tutorial, AGENTS guide, packaged examples, and bundled workspacecompact_service.pyresources now point to the coherent default service, leavingcompact_chain(...)as the custom-policy escape hatch.
Added¶
DefaultCompactServiceanddefault_compact_service(...). A clear production default for context compaction that composesPruneToolResults,SummarizeCompact, and deterministic truncate fallback into one inspectable service.CompactOutcomenow reports session-log entry counts, compacted step ranges, summaries, and a JSON-ableto_dict()used inPOST_COMPACTevent payloads.extends:workspace composition. A cartridge'sconfig.yamlmay now declareextends: <path>. At load time the parent workspace is recursively materialized and overlaid with the child via a tempdir; child files override parent files at matching paths. Multi-level inheritance works (grandparent → parent → child), cycles raiseCartridgeSerializationError, missing parent paths raise a clear error. (#44)examples/agent_factory.cartridge. First product built onextends:. Inherits allcoder.cartridgetools and adds avalidate_workspacetool that callscartridge_to_preset()and returns structured success/error. ~4500-char system prompt teaching the cartridge v2 grammar, robustness rules, andextends:usage. (#44, #45)looplet.scaffold.scaffold_cartridge(). Plain Python helper that creates a working but stubbed cartridge skeleton in one call:cartridge.json+config.yaml+prompts/system.md+tools/<name>/{tool.yaml, execute.py}stubs (raiseNotImplementedError) + the standarddonetool. Idempotent — re-running preserves existing files via_write_if_absent. (#46)builtin_tools:directive inconfig.yaml. Cartridges can opt into looplet-shipped tools without writing atools/<name>/dir: Resolved at load time vialooplet.builtin_tools.AVAILABLE. (#46)subagentbuilt-in tool. Invokes another workspace as a sub-loop, sharing the parent's LLM and forwarding the parent'sworkspace_config.pathas the sub-loop'sruntime["workspace"]. Returns the sub-loop's finaldonesummary. Recursion-guarded viacontextvars.ContextVar(defaultmax_depth=5). Sequential only — parallel fan-out deferred. (#46)scaffold_cartridgebuilt-in tool. Agent-callable wrapper around the scaffold helper. The agent factory uses it as the very first tool call. (#46)
Fixed¶
scaffold_cartridgewrote invalid JSON (cartridge.jsonemitted single-quoted Python repr instead of double-quoted JSON).Workspace.from_directory()and any externaljson.loads()failed. Now usesjson.dumps(name)to emit RFC-compliant JSON.agent_factory_extract_jsonexample inprompts/system.mdwas double-escaped inside an r-string (\\sinstead of\s,\\[instead of\[), so the regex matched nothing. Agents that copied the helper verbatim got a broken extractor. Fixed.subagentdid not actually inherit parent runtime. Docstring promisedworkspace_configpropagation; code readctx.metadata["runtime"]which the loop never sets. Now reads the parent'sworkspace_configresource and forwardsruntime["workspace"]to the sub-loop's resource builders.subagentrecursion depth via process-global env var (LOOPLET_SUBAGENT_DEPTH) — two parallel parent loops in the same process raced. Replaced with aContextVar(threadsafe and per-async-task). The sub-loop receives a freshly-constructedruntime(it does NOT share the parent'sresources/file_cacheinstances — only the cartridge path is forwarded).validate_workspacewas silent on TODO-laden scaffolds. Now scans the system prompt for<TODO:markers and tool execute.py files forNotImplementedError("scaffold:and surfaces both as warnings — agents can no longerdoneon an unfilled skeleton. (#48)subagentcwd-fallback was silent. When neither the parent'sworkspace_configresource norctx.metadata['runtime']is present, the sub-loop'sruntime['workspace']falls back toPath.cwd()AND the response now includes a structuredwarningfield with explicit recovery hints. (#48)- Tool name vs directory name mismatches (e.g.
tools/foo/whosetool.yamldeclaresname: WRONG_NAME) used to silently register the wrong name and leave the agent unable to usefoo. The loader now warns in loose mode and raisesCartridgeSerializationErrorin strict mode. (#48) - Documentation cleanups (#48).
subagentmodule docstring no longer claims to "share the parent's runtime" (it constructs a fresh one).builtin_tools/__init__.pynow lists both shipped built-ins (subagent,scaffold_cartridge). -
Internal cleanup (#48). Removed redundant
extends:line check; rewrote tempdir registry as module-level state with singleatexit.register; inlined a one-line_is_absolutehelper; removed duplicate import; switchedsubagent.max_stepssentinel from0toNone. -
examples/coder.cartridgeper-tool guidance + safety. Three information-additive improvements modelled on patterns observed in production coding agents: - Read-required-first on
edit_file.FileCachenow tracks every path passed toread_file;edit_filerefuses with a model-actionable error ({error, missing: "prior_read", recovery: "read_file(...)"}) when called on a file that hasn't been read in the current session. Editing without reading is the #1 cause ofold_stringmismatch failures. bashsafety classifiers. Newclassify_bash_commandandclassify_sed_commandhelpers incoder_lib_tools.pyflag destructive command/flag combinations (rm -rf,git push --force,git reset --hard,shutdown,mkfs, …) andsed -iin-place edits (which bypass the file_cache and cause stale reads). The bash tool refuses both with a structured error pointing at a safer alternative (edit_filefor in-place edits). The classifiers are exported so other cartridges can reuse them.- Rich per-tool descriptions. Every
tool.yamlinexamples/coder.cartridgerewritten as a multi-paragraph description (Usage / Refusals / Examples / Recovery sections) using YAML block scalars (|-). The looplet workspace YAML loader gained|/|-/>block-scalar support so these descriptions round-trip correctly. ToolError.recovery_hint— structured suggestion (dict or str) for how the caller could recover. The dispatcher now populates it on the four self-correctable errors: unknown-tool ("did you mean?"), unexpected-argument ({unexpected, did_you_mean, expected}), missing-argument ({missing, provided, expected}), and empty required-string-argument ({empty_param, expected}). Information- additive: smarter models exploit the structured hint to self-correct without re-discovering the catalog from prose; existing models still see the same human-readable error message.looplet.LLMResponsesExhausted+MockLLMBackend(cycle=False)— opt-in test ergonomics. The default still cycles for backward compatibility; passingcycle=Falsemakes the mock raise instead of silently re-usingresponses[0]past the last scripted answer (which previously made over-running loops look "stuck on step 1"). Same flag onAsyncMockLLMBackend.run_sub_loop(parent_hooks=...)— opt-in event forwarding from a sub-loop to the parent's observability stack. When supplied, the parent's hooks (e.g.MetricsHook,StreamingHook,TrajectoryRecorder) receive every lifecycle event the sub-loop emits via theiron_eventmethod, tagged withsubagent_idin the payload'sextradict so consumers can route / nest. Defaults toNone— no forwarding, sub-loop fully isolated.
Changed¶
tool.yaml requires:validated at workspace-load time. A typo inrequires: [my_resoruce](missing or mistyped resource name) used to silently setctx.resources["my_resoruce"] = Noneat dispatch and crash deep inside the tool body withAttributeError. The loader now warns in loose mode (default) and raisesCartridgeSerializationErrorin strict mode, naming the unresolvable resource and listing the available ones — surfaces the bug at its source.
Changed¶
- Naming consolidation. Dropped the legacy "cartridge" /
"Composable Harness Workspace (CHW)" / "workspace v2" terminology
in favour of the two canonical names already used in code:
Workspace(the round-trippable directory format fromlooplet.cartridge) andSkillBundle(the runnable folder format fromlooplet.bundles). All docstrings, doc pages, README, and comments now use these names. TheClaudeSkillCompatibility.levelstring"looplet-cartridge"is renamed to"looplet-bundle"— the only minor breaking change in this consolidation. Renamedtests/test_cartridge_round_trip_smoke.pytotests/test_skill_bundle_round_trip_smoke.py. No behavioural change; the_chw_*synthetic module-name prefixes (used internally by the cartridge loader) keep their names.
Removed (BREAKING)¶
- All
setup.pyfiles removed from shipped example cartridges. Every workspace underexamples/*.workspace/is now fully declarative; the imperativesetup.pymechanism remains in the loader as the documented escape hatch for callers with truly imperative load-time wiring needs but no shipped example needs one. Migrations: coder.cartridge: tools moved fromWORKSPACE_CONFIG/FILE_CACHEmodule-globals torequires: [...]intool.yamlctx.resources[...]inexecute.py.compact_servicemoved toresources/compact_service.py.
threat_intel.cartridge,dep_doctor.cartridge,git_detective.cartridge:compact_servicemoved toresources/compact_service.py.git_detectivetools moved fromREPO_CONFIGmodule-globals torequires: [repo_config]ctx.resources["repo_config"].
hello.cartridge:greettool moved from_GREETING_LOGmodule-global torequires: [greeting_log]+ctx.resources["greeting_log"].
v1 example modules deleted¶
- The legacy
examples/coder/,examples/dep_doctor/,examples/git_detective/, andexamples/threat_intel/agent-CLI directories have been removed. Their tool functions, hook classes, and helpers now live inside the matchingexamples/<name>.workspace/Composable Harness Cartridges as co-located helper modules (<wsname>_lib.pyfor the simple examples,coder_lib_{tools,hooks,wiring}.pyfor the coder one). The v2 cartridges are now the only published agent surface. - The
examples/coder/skill/SkillBundle was relocated totests/fixtures/coder_skill_bundle/(with vendored sibling modules so it loads without anyexamples.coder.*import). Alllooplet.bundles/looplet.blueprintstest coverage continues to exercise it via the new fixture path. - Removed tests that targeted the deleted v1 modules:
tests/test_coder_example_smoke.py,tests/test_coder_reliability_smoke.py,tests/test_dep_doctor_example_smoke.py,tests/test_git_detective_example_smoke.py,tests/test_threat_intel_example_smoke.py, andtest_distributions_include_coder_cartridge_and_dependencyfromtests/test_cartridge_round_trip_smoke.py.
Changed¶
- Workspace loader pushes the cartridge root onto
sys.pathfor the duration ofcartridge_to_preset, so a cartridge's tools / hooks / resources / setup.py canfrom <wsname>_lib import Xwithout a dedicated import shim. The path is removed on exit. Cartridges should pick a unique top-level lib filename (<wsname>_lib.py, not barelib.py) to avoid sys.modules cache collisions when two cartridges are loaded back-to-back in the same process.
Added¶
- Cartridge discovery without import.
discover_skill_bundles(roots)walks one or more roots and returnsBundleCardrecords (name, description, entrypoint, tags, metadata, ok/errors) without importing the entrypoint. Powers the newpython -m looplet list-bundles <roots…>CLI for product UIs and agent menus, with--jsonand--include-invalidmodes. - Eval cases as data.
EvalCase,load_cases,save_case,pytest_param_cases, and theparametrize_cases(path)decorator let users write hand-edited JSON/JSONL cases that round-trip into pytest with theirmarkscarried through.assert_evals_pass(ctx, evals)collapses the run/filter/pretty-print failure idiom into one call (with cached discovery for parametrized tests). looplet eval cases ls|showCLI subcommands for browsing case corpora directly from the terminal.- Outcome-grounded evals.
EvalContext.artifactsandEvalHook(collectors=…)let you grade what changed in the world (test results, repo diff) instead of grepping the trajectory. Trajectory directories may now ship anartifacts.jsonnext totrajectory.json;EvalContext.from_trajectory_dirloads it automatically. Collectors that raise or return non-dicts are skipped silently — observers must never break a run. AgentPreset.run(llm, …)convenience method drivescomposable_loopwith the preset's wiring in one call.composable_loop(…, max_steps=N, system_prompt=…)keyword shorthands for inline agents that don't construct aLoopConfig.OpenAIBackend.from_env()/AnthropicBackend.from_env()/AsyncOpenAIBackend.from_env()classmethods that readOPENAI_*/ANTHROPIC_*env vars in one line.OpenAIBackend(api_key=…)no longer requiresbase_url— the cloud path now works with just an API key (or env vars).BaseToolRegistry.tooldecorator registers aToolSpecin one step, mirroring the module-level@tooldecorator.save_cases(cases, directory)plural form symmetric with :func:load_cases. Refuses to write when two cases share anid(which would silently overwrite each other on disk).metadatadict onToolCallandToolResult(PR #24) for carrying out-of-band tags through the loop without subclassing. Round-trips throughto_dict().metadatadict onStepRecordandLLMCall(PR #19) for per-step / per-call annotations on saved trajectories.LifecycleEvent.HOOK_DECISION(PR #20) fires whenever a hook returns a non-noopHookDecision. Payload carries the slot, hook name, and serialized decision — single observation point for every gate, redaction, or short-circuit in the run.LifecycleEvent.DONE_ACCEPTED(PR #21) fires aftercheck_doneaccepts thedone()call and the final payload is committed. Payload includes thetool_callandtool_resultof the accepted termination — observer-only, fired right before STOP.serialize_harness(...)+TrajectoryRecorder(harness_snapshot=…)(PR #22) record a stable JSON-friendly snapshot of the agent config, tool list, hooks, and LLM backend on every saved trajectory. Lands intrajectory.metadata["harness_snapshot"].tool_callkwarg onLoopHook.check_done(PR #23) so quality gates can inspect the agent's pending answer before it terminates. Backward-compatible: existingcheck_done(self, state, log, ctx, step_num)signatures continue to work viainspect.signaturedetection.
Changed¶
- Coder example split into modules.
examples/coder/agent.pynow delegates toexamples/coder/{tools,hooks,wiring}.pyso the library entrypoint and the runnable cartridge inexamples/coder/skill/share exactly the same composition. Modify behavior inwiring.pyonce and both surfaces pick it up. Public symbols re-exported for back-compat. looplet evalsubcommand now routes tolooplet.evals.eval_cliwith full-h/--helpsupport; the top-level CLI no longer eats option-like tokens before they reach the eval parser. The eval help text now also documents thecases ls|showsubcommands.- Coder hooks default to "steer, don't restrict".
TestGuardHookships in observe-only mode (failures inject a briefing nudge butdone()is never blocked);StagnationHookusesresult_size_fingerprintwith a lenient threshold so legitimate retries don't trip a stall warning. Passtest_strict=Trueto recover the legacy hard-block behavior. - Coder example ships an outcome-grounded
EvalHookthat re-runs the project's pytest suite after the loop and surfacestests_passingviactx.artifacts.
Fixed¶
EvalContext.from_trajectory_dirnow preservestrajectory.metadata. Previously only four well-known top-level fields (run_id,started_at,ended_at,termination_reason) were copied intoEvalContext.metadata, silently droppingharness_snapshot(added by PR #22'sTrajectoryRecorder(harness_snapshot=…)kwarg) and any user-attached metadata. The fulltrajectory.metadatadict is now overlaid first, with the four top-level fields applied on top.TrajectoryRecordernow reflects latemetadatamutations. When a downstream hook ranpost_dispatchafterTrajectoryRecorderand taggedtool_call.metadata/tool_result.metadata(the documented annotation point added by PR #24), the mutations were silently lost because the recorder had already snapshotted viato_dict(). The recorder now sweepsstate.stepsinon_loop_endand refreshes the metadata fields on every capturedStepRecordso hook ordering no longer affects the saved trajectory.OpenAIBackend.from_env/AnthropicBackend.from_env/AsyncOpenAIBackend.from_envraise clean errors upfront. Previously, the OpenAI variants leaked an SDK-levelOpenAIErrorwhen neitherOPENAI_API_KEYnorOPENAI_BASE_URLwas set, andAnthropicBackend.from_envraised aTypeErrorfrom looplet's own constructor. Both now raise a singleRuntimeErrorwith an actionable message naming the env var to set. The OpenAI variants also now defaultapi_keyto a sentinel when onlyOPENAI_BASE_URLis set, so local-server flows (Ollama / vLLM / llama.cpp) work without settingOPENAI_API_KEY=ollama.- Coder example: clarified the
eval_tests_passedskip reason. The label now says "no Python project (pyproject.toml/setup.py) detected in workspace; collector cannot re-run tests" instead of the misleading "no test runner detected", since the collector needs a project file to re-run pytest against. discover_skill_bundlesacceptson_duplicate=andlooplet list-bundlesno longer crashes on collisions. Previously, two bundles claiming the samenamefield always raisedValueError, so a single dirty discovery root (e.g. left-over pytest fixtures under/tmp) made the entirelooplet list-bundlesCLI unusable. The function now acceptson_duplicate="raise"(default, back-compat),"first_wins"(silent), or"warn"(logs each collision tolooplet.bundles). The CLI passes"warn"so users see what was dropped but still get a list of valid bundles.Conversationserialization now round-tripsToolCall/ToolResultmetadata. PR #24 added the field but the Conversation serializer dropped it silently;Conversation.deserializealso never restored it. Both sides now plumb the field, so saved conversations preserve any out-of-band tags hooks attached.Message(role="system", …)no longer breaks serialization.MessageRoleis astr, Enum, so callers naturally pass plain strings — but_serialize_messagedidmsg.role.value, which raisedAttributeErrorwhenrolewas a plainstr.Message.__post_init__now coerces toMessageRoleso both call styles work identically.check_donesignature cache no longer poisoned by id reuse. PR #23's backward-compat dispatch cached_accepts_tool_call_kwargresults keyed onid(bound_method). Bound methods are ephemeral in CPython (obj.methodcreates a fresh object each access), so they get garbage-collected and their ids get reused for unrelated methods on other classes — leaving the cache claiming a legacy-signature hook acceptstool_call, then raisingTypeError: ...check_done() got an unexpected keyword argument 'tool_call'. The cache now keys onid(method.__func__)(the stable underlying function) with a fallback toid(method)for callables that lack__func__.async_composable_loopnow acceptsmax_steps=andsystem_prompt=shorthands. The synccomposable_loopgot these convenience kwargs but the async version was missed — callers had to construct aLoopConfigeven for one-liner async agents. The signatures now match.generate_kwargsnow reach backends declared as**kwargs._accepts_kwargonly matched explicitly-named parameters in the backend'sgenerate(...)signature, so any backend written asdef generate(self, prompt, **kw)(a common permissive pattern) silently dropped every entry ofLoopConfig.generate_kwargs. The helper now also returns True when the function declares aVAR_KEYWORDparameter, sotop_p,response_format,chat_template_kwargs, etc. propagate as documented.save_case(case, "evals/cases/")no longer creates a file literally namedcases. The "treat as directory" branch only fired when the path already existed, so a non-existent trailing-slash path (the obvious "I want a directory" convention shown indocs/evals.md) wrote the case content into a single file at the path. The helper now also treats trailing path separators as directory intent and creates the parent directories before writing<dir>/<case.id>.json.MetricsCollector.total_llm_callsis now populated by default. The field was advertised in the report but no built-in hook updated it, so it sat at 0 unless callers wired their own counter.MetricsHook.on_eventnow increments it on everyPOST_LLM_RESPONSElifecycle event.
[0.1.8] - 2026-04-24¶
Added¶
ctx.llm: tools receive the loop's LLM backend for internal calls. Tracked byRecordingLLMBackendwithscope="tool:<name>"for nested provenance.LLMCall.scope: provenance field for loop vs tool-internal calls.state.step_context: per-step ephemeral dict for hook-to-hook communication.LoopConfig.tool_metadata: static dict merged into everyToolContext.metadata.LoopConfig.generate_kwargs: extra kwargs passed through to every LLM call. Can overridetemperature,max_tokens,system_prompt. Supports provider-specific params (chat_template_kwargs,response_format,top_p).async_composable_loop: async generator for async LLM backends._SyncBridgeLLM: sync tools can usectx.llm.generate()even with async backends.OpenAIBackend.tool_choice: configurabletool_choiceparameter.PerToolLimitHook.default_limit: blanket cap for all tools.CompactOutcome.compacted: property indicating if compaction reduced context.register_done_tool(): convenience for registering the done tool.EvalResult.passed: property for pass/fail determination.- Async tool dispatch in sync loop:
dispatch()detects coroutine returns. - 3 example agents: threat intel briefing, git history detective, dependency doctor.
Changed¶
ToolContextis now always created (neverNone), withmetadatapopulated fromstate.metadata(copy, not reference).default_max_tokensdefaults toNoneacross all backends — lets the provider API decide instead of forcing 2000.- All docs and examples updated to use convenience
OpenAIBackend(base_url=...)andregister_done_tool().
Fixed¶
Step.to_dict()key names:call→tool_call,result→tool_result.- Tool validation error now shows what args were provided.
Step.summary()shows dict preview instead of?.Trajectory.taskfield preserved in trajectory.json for eval round-trip.TrajectoryRecorder(output_dir=...)auto-saves on loop end.RecoveryRegistry.registerwarns on overwrite.clone_tools_excludingwarns on missing names (typo detection).- Permission audit strips
__…scaffolding keys. - Conversation
compact()marks summary as compaction boundary.
[0.1.7] - 2026-04-21¶
First public release of looplet.
Added (launch polish)¶
ROADMAP.mdwith a frozen v1.0 API contract and explicit out-of-scope list.docs/site scaffold (tutorial, evals, recipes, hooks, good-first-issues, discussions-seed, demo-script) + mkdocs-material config + GitHub Pages workflow.THIRD_PARTY_USERS.mdsocial-proof seed.src/looplet/examples/ollama_hello.py— zero-API-key onboarding.- Codecov upload step in CI (non-blocking).
- Leaner README (<170 lines) with the pydantic-ai-harness disambiguation moved to the top.
Added (evals — pytest-style agent evaluation)¶
- Eval framework (
looplet.evals). Writeeval_*functions that takeEvalContextand return any offloat,bool,str,dict, orEvalResult. The framework normalizes all return types. eval_discover(path)— auto-discovers eval functions ineval_*.pyfiles (like pytest discoverstest_*).eval_run(evals, ctx)— runs evaluators, auto-detectsllmparameter for LLM-as-judge, catches errors gracefully.eval_run_batch(evals, contexts)— runs same evals across multiple trajectories with per-eval avg/min/max aggregation.eval_mark(*tags)— decorator for categorizing evals.eval_runandeval_run_batchacceptinclude=/exclude=to filter by marks.eval_cli(args)— CLI runner with threshold-based pass/fail exit codes for CI integration.EvalHook— LoopHook that builds EvalContext aton_loop_endand runs all evaluators automatically during development.EvalContext.from_trajectory_dir()— loads context from saved trajectories with support for both looplet and benchmark formats.
Added (MCP + skills)¶
MCPToolAdapter— wraps MCP server tools asToolSpecinstances via JSON-RPC over stdio. No MCP SDK required.Skill— bundles tools + context + prompt fragment into one loadable unit.skill.register(registry)adds all tools.
Added (approval)¶
ApprovalHook— stops the loop when a tool returnsneeds_approval=True. Combined withcheckpoint_dirfor crash-safe async human-in-the-loop approval.- Renamed
elicit→approvaluniformly:LoopConfig.approval_handler,ToolContext.request_approval,ToolContext.approve().
Changed (naming cleanup)¶
- Renamed internal names for clarity:
coerce_text→to_text,DiminishingReturnsTracker→StallDetector,reactive_compact→emergency_truncate,compress_session_log→age_session_entries,enforce_result_budget→trim_results,should_compress_context→is_context_oversized,HEAVY_BLOCK_KINDS→LARGE_CONTENT_TYPES,DefaultSummarizer→default_summarizer. - Renamed compact services:
DefaultCompactService→TruncateCompact,LLMCompactService→SummarizeCompact. - Renamed
normalise_hook_return→normalize_hook_return. - Moved
concurrent_dispatchandreactive_recoveryfromFLAGSglobal singleton toLoopConfigfields. - Trimmed
__all__from 154 → 54 symbols organized into labeled tiers.
Changed (developer experience)¶
- Added
preview_prompt()— shows what the LLM sees before the first call. Invaluable for debugging. - Added
TrajectoryRecorder.summary()— one-liner run summary. - Added
--trace DIRto coding_agent example for trajectory recording. - Added step-by-step tutorial to README (5 progressive steps).
- Added
LoopConfigdocstring with "start here" guide listing the 4 essential fields. - Added
FileCheckpointStore.load_latest()+ auto-resume wiring incomposable_loop— crash-resume is now one line:LoopConfig(checkpoint_dir="./ckpt").
Removed¶
- Removed
async_loop.py(feature-frozen, no consumers). - Removed 3 mock examples (calculator, code_review, research).
Replaced with
hello_world.py(real LLM) +coding_agent.py(Claude Code-equivalent tools: bash, read, write, edit, glob, grep, think, done). - Removed all back-compat aliases.
- Removed all internal project references (cadence, primal_security).
Added (compaction strategies)¶
PruneToolResults— new zero-LLM-call compaction service that clears old tool-result content while keeping conversation structure intact. Configurablekeep_recent(how many recent tool results to preserve) andcompactable_tools(restrict to specific tools). Cheapest possible compaction — use as the first stage in a chain.compact_chain(*services)— combinator that tries compaction services in order; first stage that has an effect wins. Replaces the need for a separateChainedCompactServiceclass. Usage:compact_chain(PruneToolResults(), SummarizeCompact(), TruncateCompact()).CompactOutcome.cleanup— optional post-compact callback. When set,run_compact()invokes it after firingPOST_COMPACT. Use for domain-specific state resets (clear caches, re-inject context, reset token baselines) without the loop knowing details.
Changed (renames — back-compat aliases kept)¶
DefaultCompactService→TruncateCompact— clearer name for "drop old entries, keep N recent, zero LLM calls."LLMCompactService→SummarizeCompact— clearer name for "LLM summarizes middle, keeps N recent."- Old names (
DefaultCompactService,LLMCompactService) remain as aliases and continue to work.
Added (context management pt. 2)¶
- Prompt caching infrastructure (
looplet.cache). NewCachePolicydataclass declares which stable prompt sections (system prompt, tool schemas, memory) should carry Anthropic-stylecache_controlmarkers, with per-section TTL (ephemeral/1h).LoopConfig.cache_policythreads per-turnCacheBreakpointlists (label + SHA-256 hash + TTL) to backends that exposegenerate_with_cache(..., cache_breakpoints=[...]). Backends without the kwarg keep working unchanged — caching is strictly additive.CacheBreakDetectorships as a drop-in observer hook that records section-hash changes across turns for cache-miss telemetry. LLMCompactService— new compaction strategy that spends one LLM call to summarise the session. Produces a dense 4-section summary (task goal, findings, open questions, recent decisions) spliced into the session log as a synthetic entry after keep-recent pruning. Falls back to deterministic keep-recent on any summariser error. Trade-off vsDefaultCompactService: one LLM call per compaction for preserved reasoning chains.- Threshold-tier context budgeting (
looplet.budget). NewContextBudgetdataclass withwarning_at/error_at/compact_buffertiers.ThresholdCompactHookis a ready-to-registershould_compactimplementation that fires proactive compaction once estimated tokens cross the configured tier.BudgetTelemetryobserver records per-step tier samples and exposespeak_tierfor production dashboards.
Added (architecture improvements)¶
- Proactive compact hook slot —
LoopHook.should_compact(state, session_log, conversation, step_num) -> bool. Fires at the top of each step, before prompt build. Any hook returningTruetriggers the configuredCompactServicepreemptively. Complements the reactiveprompt_too_longpath — use for message-count or token-estimate heuristics.StreamingHookgets a no-op stub. - Tool-result streaming via
TOOL_PROGRESS— newLifecycleEvent.TOOL_PROGRESS. When hooks are present, the loop builds aToolContext.on_progresscallback per tool-call that emitsTOOL_PROGRESS(with the originatingtool_call) whenever the tool invokesctx.report_progress(stage, data). Observers can stream intermediate output from long-running tools without blocking dispatch. - Budget-aware turn continuation — new
LoopConfig.max_turn_continuations: int = 0. When> 0and the backend exposeslast_stop_reason,llm_call_with_retrywill re-prompt up to N times onstop_reason == "max_tokens"and concatenate outputs so long thoughts aren't truncated mid-message.LLMResultgainsstop_reasonandcontinuationsfields. build_briefing/build_promptas hook slots — both are now optional methods onLoopHook. First hook returning a non-Nonestring wins; the loop falls back toLoopConfig.build_briefing/config.build_prompt/ the built-in default. Lets domain hooks own prompt construction without threading callables throughLoopConfigseparately.DomainAdapter— new dataclass bundling the five domain callables (build_briefing,extract_entities,build_trace,build_prompt,extract_step_metadata) into a single object.LoopConfig.domain: DomainAdapter | None = Noneseeds matching flat fields when they areNone. Flat fields still win over the adapter, which wins over built-in defaults — use the adapter to package a reusable agent in one handle instead of five kwargs.
Removed (breaking)¶
InvestigationLogbackward-compat alias is gone — useSessionLogdirectly.HARNESS_FLAGSbackward-compat alias is gone — useFLAGS.- Legacy
CADENCE_*environment variables for feature flags are no longer read; use theLOOPLET_*prefix. _clone_tools_excludingprivate alias is gone — useclone_tools_excluding.LoopConfig.permissionsis gone. Register aPermissionHook(PermissionEngine(...))inhooks=[...]instead — it flows through the same unifiedHookDecision+ event bus as every other hook.
Added¶
- Unified hook vocabulary —
HookDecision(looplet.hook_decision). All hook slots now accept a singleHookDecisionreturn type (legacyNone/bool/strreturns still work vianormalise_hook_return). HelpersAllow(),Deny(reason),Block(reason),Stop(reason),Continue(),InjectContext(text)make intent explicit at the call site. - Lifecycle events —
on_event(payload)(looplet.events).LoopHookgained an optionalon_event(EventPayload)method. The loop now fires 10 named events:SESSION_START,PRE_LLM_CALL,POST_LLM_RESPONSE,PRE_TOOL_USE,POST_TOOL_USE,POST_TOOL_FAILURE,PRE_COMPACT,POST_COMPACT,STOP,SUBAGENT_START,SUBAGENT_STOP. Any hook can subscribe with a single method instead of implementing every slot. PermissionHook(looplet.permissions) — wrapsPermissionEngineand plugs it into the event bus so policy decisions flow through the sameHookDecisionpath as custom hooks.CompactService+DefaultCompactService+run_compact(...)(looplet.compact) — reactive compaction is now a swappable service withPRE_COMPACT/POST_COMPACTevents.LoopConfig.render_messages_override— byte-exact escape hatch. Receives(messages, default_prompt, step_num)and returns the exact prompt string sent to the LLM. Lets advanced callers take full control of prompt rendering without forking the loop.- First-class subagents —
run_sub_loop(..., subagent_id=...)now firesSUBAGENT_START/SUBAGENT_STOPevents on the parent's hooks and returnssubagent_idin the result dict for correlation. replay_loop(trace_dir, tools=...)— rerun a captured trace through a freshcomposable_loopwithout calling the LLM again. Useful for golden-trajectory regression tests, hook A/Bs, and cost-free loop diffs. RaisesRuntimeErrorif the replay loop requests more calls than were recorded or diverges in method (generatevsgenerate_with_tools). Falls back tocall_NN_response.txtfiles whenmanifest.jsonlis missing.python -m looplet show <trace-dir>— stdlib-only CLI that prints a one-page summary of a captured trace (run id, termination, per-step tool calls with durations, LLM totals). Exit code 1 when the directory is missing or malformed.looplet.provenance— new module for debugging agent runs:RecordingLLMBackend/AsyncRecordingLLMBackendwrap any backend and capture every prompt, system prompt, tool schema, response, duration, and error asLLMCallrecords.generate_with_toolsis surfaced only when the wrapped backend supports it, soNativeToolBackenddetection stays honest.TrajectoryRecorderhook captures a structuredTrajectoryper run (steps, context-before, termination reason, embeddedTracerspans) and writestrajectory.json+steps/step_NN.json.ProvenanceSinkis a 3-line facade:wrap_llm(...),trajectory_hook(),flush().- On-disk layout is diff-friendly:
call_NN_prompt.txt/call_NN_response.txtper LLM call plus amanifest.jsonl. - Both recorders accept
redact=for secret scrubbing andmax_chars_per_call=for bounded memory. - See Provenance guide for API reference, recipes, and performance notes.
Step.pretty()— human-readable CLI formatter complementingStep.summary()(which is tuned for LLM context assembly).
[0.1.6] - 2026-04-17¶
Added¶
looplet.testing— public test-utility module exposingMockLLMBackendandAsyncMockLLMBackend(scripted, zero-dependency) so downstream packages can unit-test hooks, tools, and backends without a real LLM provider.- PyPI publish workflow (
.github/workflows/publish.yml) that builds + publishes on version tags via PyPI trusted publishing. - README positioning matrix comparing
loopletto LangGraph, DSPy, and smolagents; observability/OTel wiring example; stability & versioning policy; realAnthropicBackendusage in quick-start.
Fixed¶
resume_loop_state()now restores the checkpointedConversationthread (was silently dropping multi-turn message history on resume).RoutingLLMBackend.generate_with_toolsis now gated dynamically via__getattr__sohasattr(llm, "generate_with_tools")returns a truthful answer for the currently-selected backend (consistent with_FallbackLLMandCostTracker).- Async
__llm_error__step is now recorded through_historyto match the sync loop (previously caused session-log/conversation drift on LLM failure).
Previously added in this release¶
ToolErrortaxonomy — structuredErrorKindenum (PERMISSION_DENIED,TIMEOUT,VALIDATION,EXECUTION,PARSE,CONTEXT_OVERFLOW,RATE_LIMIT,NETWORK,CANCELLED) plus aToolErrordataclass.ToolResultnow carries botherror: str(for JSON-safe display) anderror_detail: ToolError(for introspection).PermissionEngine— declarativeALLOW/DENY/ASK/DEFAULTrules with fail-closedarg_matcher, plug-inask_handlerfor human-in-the-loop, and an append-only denial audit log.CancelToken— cooperative cancellation is now threaded throughLoopConfig→llm_call_with_retry/async_llm_call_with_retry→ToolContext.cancel_token, so both the next LLM call and any in-flight tool can stop cleanly.ToolContext.elicit—LoopConfig.elicit_handlersurfaces a genericelicit(prompt) → strprotocol to tools for interactive prompts.- Multi-block messages —
Message.contentsupports alistofContentBlock(kind, data)alongside plainstr.HEAVY_BLOCK_KINDS(image/audio/video/binary) are stripped before summarization. - Async
build_trace—async_composable_loopnow stashes the built trace onstate.traceat exit (async generators can'treturna value). SyncToAsyncAdapter.generate_with_tools— router-selected sync backends keep native-tools support in the async loop.- Preflight context check — async loop matches sync by skipping a
doomed LLM call when the prompt is already too long under
FLAGS.reactive_recovery. - Checkpoint state counters —
resume_loop_statenow round-tripsstate.queries_usedandstate.budget_remainingso budget enforcement continues across resume.
Changed¶
ToolResult.errornarrowed back tostr | None(JSON-safe). UseToolResult.error_detailfor structured introspection.PermissionRule.matches()now fails closed per decision type:DENYrules match on matcher errors (block),ALLOW/ASKrules do not (don't accidentally grant).PermissionEngine._resolve_defaultcollapses ambiguous engine defaults (ASK/DEFAULT) toDENYso a decision never leaks into aPermissionOutcomewhere both.allowedand.deniedare False.ToolSpec._accepts_ctxis computed eagerly atregister()time (and self-heals indispatch()for specs inserted directly)._backend_accepts_cancel_tokencache keyed by(type, method_name)instead ofid()(eliminates id-recycling hazard)._classify_exceptionbroadened to detectasyncio.CancelledError, rate-limit, context-overflow, and parse exceptions by class name / message content.SyncToAsyncAdapter._adapter_cachenow prefers the backend object itself as the dict key, withid()as a fallback for unhashable backends.SessionLog.to_list()includesrecall_keyfor full round-trip through checkpoints.ToolError.contextnow round-trips throughConversation.serialize/deserialize.- Permission-denied results from hooks now populate
error_detailwithErrorKind.PERMISSION_DENIED(parity with thePermissionEnginepath) in both sync and async loops.
Fixed¶
_rebuild_promptnow rendersmemoryand falls back to the structuredbuild_promptfromlooplet.promptsinstead of a bare f-string, restoring parity with the first-pass build._deserialize_messagenow reconstructsToolErrorfrom serializederror_kind/error_retriable/error_contextfields._NullSessionLog(async) gained the attributes the async loop expects:entries,current_theory,to_list(),compact().
[0.1.5] - initial public import¶
- Initial release as a standalone package. See the extraction commit history for the pre-extraction development timeline.