diff --git a/.codeflash/krrt7/microsoft/typeagent/README.md b/.codeflash/krrt7/microsoft/typeagent/README.md index af05ebe..4f2fe15 100644 --- a/.codeflash/krrt7/microsoft/typeagent/README.md +++ b/.codeflash/krrt7/microsoft/typeagent/README.md @@ -138,11 +138,11 @@ Stacked PRs — merge in order (#231 first, each subsequent PR builds on the pre |---|---|---|---| | 1 | [microsoft/typeagent-py#231](https://github.com/microsoft/typeagent-py/pull/231) | **Merged** | Fix parse_azure_endpoint passing query string to AsyncAzureOpenAI | | 2 | [microsoft/typeagent-py#229](https://github.com/microsoft/typeagent-py/pull/229) | **Merged** | Defer black import to first use | -| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | Open | Batch SQLite INSERTs for indexing pipeline | -| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Open | Batch metadata query to avoid N+1 across 5 call sites | +| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | In review | Batch SQLite INSERTs for indexing pipeline (reviewer: @bmerkle, comments addressed) | +| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Blocked on #230 | Batch metadata query to avoid N+1 across 5 call sites (reviewer: @bmerkle, 4 comments — will rebase after #230 merges) | | 5 | [microsoft/typeagent-py#234](https://github.com/microsoft/typeagent-py/pull/234) | **Merged** | Vectorize fuzzy_lookup_embedding with numpy ops (see [Lessons Learned](#lessons-learned)) | -| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | Open | Replace black with stdlib pprint for runtime formatting | -| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | Open | Defer query-time imports in conversation_base | +| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | **Merged** | Replace black with stdlib pprint for runtime formatting | +| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | **Closed** | Defer query-time imports in conversation_base (rejected — maintainer said savings not worth complexity) | | — | [shreejaykurhade/typeagent-py#1](https://github.com/shreejaykurhade/typeagent-py/pull/1) | **Merged** | Numpy vectorized fuzzy lookup (against PR #228's fork — see [Lessons Learned](#lessons-learned)) | ### Fork PRs diff --git a/README.md b/README.md index 668f8bc..ff99a1a 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,7 @@ Autonomous performance optimization platform. Profiles code, implements optimiza |---|---|---| | Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) | | pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/pypa/pip/summary.md) | +| typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) | ## Domains diff --git a/case-studies/microsoft/typeagent/summary.md b/case-studies/microsoft/typeagent/summary.md new file mode 100644 index 0000000..2218748 --- /dev/null +++ b/case-studies/microsoft/typeagent/summary.md @@ -0,0 +1,83 @@ +# typeagent-py Optimization — Lessons Learned + +Full case study: [.codeflash/krrt7/microsoft/typeagent/](../../../.codeflash/krrt7/microsoft/typeagent/) + +## Context + +[typeagent-py](https://github.com/microsoft/typeagent-py) is Microsoft's Python library for structured RAG — ingesting conversations, building semantic indexes, and querying them with LLM-backed answer generation. Maintained by Guido van Rossum. + +We targeted three areas: import time (heavy eager imports), indexing pipeline (N+1 SQLite inserts), and query path (N+1 metadata fetches with unnecessary deserialization). + +## What we did (by impact) + +### Query batching (2.10x–2.62x) + +Five call sites used `get_item()` per scored ref — one SELECT and full deserialization per match (N+1 pattern). Profiling showed 64% of per-row cost was deserializing `knowledge_json`, which the filters never use. + +Added `get_metadata_multiple` that fetches only `semref_id, range_json, knowledge_type` in a single batch query. Further optimized scope-filtering with binary search in `TextRangeCollection.contains_range` and inline tuple comparisons in `TextRange`. + +### Import time (1.16x — 791ms to 683ms) + +1. **Deferred `black` import**: `black` was imported at module level but only used in cold formatting paths. Moved inside functions (−78ms) +2. **Replaced `black` with stdlib `pprint`**: Removed `black` from runtime dependencies entirely +3. **Deferred query-time imports**: Moved 4 query-only modules behind `TYPE_CHECKING` guards (−51ms) — **rejected by maintainer** as not worth the complexity given LLM roundtrip dominates + +### Indexing pipeline (1.14x–1.16x) + +Batched SQLite INSERTs via `executemany` instead of individual `cursor.execute()` per term/property. Reduced 1000+ INSERT calls to 2–3 for 200 messages. + +### Bugfix + +Fixed `parse_azure_endpoint` passing query string to `AsyncAzureOpenAI`, producing mangled URLs. Added 6 unit tests. + +## Results + +| Category | Benchmark | Before | After | Speedup | +|---|---|---:|---:|---:| +| Import | `import typeagent` | 791 ms | 683 ms | **1.16x** | +| Query | `lookup_term_filtered` (200 matches) | 2.652 ms | 1.260 ms | **2.10x** | +| Query | `group_matches_by_type` (200 matches) | 2.453 ms | 992 us | **2.47x** | +| Query | `lookup_property_in_property_index` (200 matches) | 24.5 ms | 9.4 ms | **2.61x** | +| Query | `get_matches_in_scope` (200 matches) | 24.1 ms | 9.2 ms | **2.62x** | +| Indexing | `add_messages_with_indexing` (200 msgs) | 28.8 ms | 25.0 ms | **1.16x** | +| E2E | 69 offline tests | 5.72 s | 5.60 s | **1.02x** | + +## Upstream status + +| PR | Status | Description | +|---|---|---| +| #231 | **Merged** | Fix parse_azure_endpoint | +| #229 | **Merged** | Defer black import | +| #234 | **Merged** | Vectorize fuzzy_lookup_embedding (see failure below) | +| #235 | **Merged** | Replace black with stdlib pprint | +| #236 | **Closed** | Defer query imports (rejected — not worth complexity) | +| #230 | In review | Batch SQLite INSERTs | +| #232 | Blocked on #230 | Batch metadata query | + +4 of 7 upstream PRs merged, 1 rejected, 2 in review. + +## Key takeaways + +### 1. Validate bottlenecks against real workloads, not synthetic benchmarks + +We optimized vector search hot paths in a community contributor's PR (#228), achieving 3.7x–14.2x speedups on micro-benchmarks. Guido pointed out that model API calls and SQL queries dominate real-world latency — the numpy iteration time is noise. A 14x speedup on a non-bottleneck is worth 0x in practice. + +Similarly, #236 (defer query imports, −51ms) was rejected because ingestion takes seconds and queries always need those modules anyway. The maintainer's perspective on what matters was more accurate than our profiler data. + +### 2. Don't optimize unmerged third-party code + +PR #228 was third-party work with its own issues — Guido later showed with real evals that the proposed min_score tuning made results *worse*. The contributor merged our vector search PR without review ("Noice"). We picked a target based on what was convenient to optimize, not what the project needed. + +### 3. Maintainer engagement style matters + +Guido's review style is direct and opinionated. He rejected `from __future__ import annotations` on principle (will be deprecated), rejected import-time optimizations as not worth complexity, and expected us to justify why a bottleneck matters before optimizing it. Engaging early with "is this worth doing?" (#229 comment) was the right approach — we should have done that for every PR, not just the first one. + +### 4. Code review feedback is where the real value shows + +@bmerkle's review of #230 caught a real bug (inverse_actions not indexed) and identified code duplication. Our response to that review — fixing the bug, eliminating duplicated functions — was arguably more valuable than the original optimization. + +## Applicable to codeflash + +- **N+1 query patterns**: The `get_metadata_multiple` pattern (fetch only needed columns in batch, skip expensive deserialization) applies to any codeflash code that iterates over DB results and deserializes each one +- **Validate before optimizing**: Build maintainer alignment into the workflow — profile the real workload, confirm the bottleneck matters, then optimize. Don't let impressive micro-benchmark numbers drive target selection +- **Import deferral has limits**: For libraries where LLM roundtrips dominate, shaving 50ms off import time isn't worth code complexity. Know the latency budget before investing