Add typeagent-py case study (#17)

- Add case-studies/microsoft/typeagent/summary.md with results, lessons learned (failed vector search experiment, maintainer alignment), and takeaways for codeflash - Update upstream PR statuses: #235 merged, #236 closed (rejected), #232 blocked on #230 - Add typeagent to main README results table
2026-05-04 18:25:19 +00:00 · 2026-04-14 23:25:29 -05:00 · 2026-04-14 23:25:29 -05:00 · 09ba9b44b2
commit 09ba9b44b2
parent 6dd3b02168
3 changed files with 88 additions and 4 deletions
--- a/.codeflash/krrt7/microsoft/typeagent/README.md
+++ b/.codeflash/krrt7/microsoft/typeagent/README.md
@ -138,11 +138,11 @@ Stacked PRs — merge in order (#231 first, each subsequent PR builds on the pre
 |---|---|---|---|
 | 1 | [microsoft/typeagent-py#231](https://github.com/microsoft/typeagent-py/pull/231) | **Merged** | Fix parse_azure_endpoint passing query string to AsyncAzureOpenAI |
 | 2 | [microsoft/typeagent-py#229](https://github.com/microsoft/typeagent-py/pull/229) | **Merged** | Defer black import to first use |
-| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | Open | Batch SQLite INSERTs for indexing pipeline |
-| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Open | Batch metadata query to avoid N+1 across 5 call sites |
+| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | In review | Batch SQLite INSERTs for indexing pipeline (reviewer: @bmerkle, comments addressed) |
+| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Blocked on #230 | Batch metadata query to avoid N+1 across 5 call sites (reviewer: @bmerkle, 4 comments — will rebase after #230 merges) |
 | 5 | [microsoft/typeagent-py#234](https://github.com/microsoft/typeagent-py/pull/234) | **Merged** | Vectorize fuzzy_lookup_embedding with numpy ops (see [Lessons Learned](#lessons-learned)) |
-| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | Open | Replace black with stdlib pprint for runtime formatting |
-| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | Open | Defer query-time imports in conversation_base |
+| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | **Merged** | Replace black with stdlib pprint for runtime formatting |
+| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | **Closed** | Defer query-time imports in conversation_base (rejected — maintainer said savings not worth complexity) |
 | — | [shreejaykurhade/typeagent-py#1](https://github.com/shreejaykurhade/typeagent-py/pull/1) | **Merged** | Numpy vectorized fuzzy lookup (against PR #228's fork — see [Lessons Learned](#lessons-learned)) |

 ### Fork PRs
--- a/README.md
+++ b/README.md
@ -8,6 +8,7 @@ Autonomous performance optimization platform. Profiles code, implements optimiza
 |---|---|---|
 | Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) |
 | pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/pypa/pip/summary.md) |
+| typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) |

 ## Domains

--- a/case-studies/microsoft/typeagent/summary.md
+++ b/case-studies/microsoft/typeagent/summary.md
@ -0,0 +1,83 @@
+# typeagent-py Optimization — Lessons Learned
+
+Full case study: [.codeflash/krrt7/microsoft/typeagent/](../../../.codeflash/krrt7/microsoft/typeagent/)
+
+## Context
+
+[typeagent-py](https://github.com/microsoft/typeagent-py) is Microsoft's Python library for structured RAG — ingesting conversations, building semantic indexes, and querying them with LLM-backed answer generation. Maintained by Guido van Rossum.
+
+We targeted three areas: import time (heavy eager imports), indexing pipeline (N+1 SQLite inserts), and query path (N+1 metadata fetches with unnecessary deserialization).
+
+## What we did (by impact)
+
+### Query batching (2.10x–2.62x)
+
+Five call sites used `get_item()` per scored ref — one SELECT and full deserialization per match (N+1 pattern). Profiling showed 64% of per-row cost was deserializing `knowledge_json`, which the filters never use.
+
+Added `get_metadata_multiple` that fetches only `semref_id, range_json, knowledge_type` in a single batch query. Further optimized scope-filtering with binary search in `TextRangeCollection.contains_range` and inline tuple comparisons in `TextRange`.
+
+### Import time (1.16x — 791ms to 683ms)
+
+1. **Deferred `black` import**: `black` was imported at module level but only used in cold formatting paths. Moved inside functions (−78ms)
+2. **Replaced `black` with stdlib `pprint`**: Removed `black` from runtime dependencies entirely
+3. **Deferred query-time imports**: Moved 4 query-only modules behind `TYPE_CHECKING` guards (−51ms) — **rejected by maintainer** as not worth the complexity given LLM roundtrip dominates
+
+### Indexing pipeline (1.14x–1.16x)
+
+Batched SQLite INSERTs via `executemany` instead of individual `cursor.execute()` per term/property. Reduced 1000+ INSERT calls to 2–3 for 200 messages.
+
+### Bugfix
+
+Fixed `parse_azure_endpoint` passing query string to `AsyncAzureOpenAI`, producing mangled URLs. Added 6 unit tests.
+
+## Results
+
+| Category | Benchmark | Before | After | Speedup |
+|---|---|---:|---:|---:|
+| Import | `import typeagent` | 791 ms | 683 ms | **1.16x** |
+| Query | `lookup_term_filtered` (200 matches) | 2.652 ms | 1.260 ms | **2.10x** |
+| Query | `group_matches_by_type` (200 matches) | 2.453 ms | 992 us | **2.47x** |
+| Query | `lookup_property_in_property_index` (200 matches) | 24.5 ms | 9.4 ms | **2.61x** |
+| Query | `get_matches_in_scope` (200 matches) | 24.1 ms | 9.2 ms | **2.62x** |
+| Indexing | `add_messages_with_indexing` (200 msgs) | 28.8 ms | 25.0 ms | **1.16x** |
+| E2E | 69 offline tests | 5.72 s | 5.60 s | **1.02x** |
+
+## Upstream status
+
+| PR | Status | Description |
+|---|---|---|
+| #231 | **Merged** | Fix parse_azure_endpoint |
+| #229 | **Merged** | Defer black import |
+| #234 | **Merged** | Vectorize fuzzy_lookup_embedding (see failure below) |
+| #235 | **Merged** | Replace black with stdlib pprint |
+| #236 | **Closed** | Defer query imports (rejected — not worth complexity) |
+| #230 | In review | Batch SQLite INSERTs |
+| #232 | Blocked on #230 | Batch metadata query |
+
+4 of 7 upstream PRs merged, 1 rejected, 2 in review.
+
+## Key takeaways
+
+### 1. Validate bottlenecks against real workloads, not synthetic benchmarks
+
+We optimized vector search hot paths in a community contributor's PR (#228), achieving 3.7x–14.2x speedups on micro-benchmarks. Guido pointed out that model API calls and SQL queries dominate real-world latency — the numpy iteration time is noise. A 14x speedup on a non-bottleneck is worth 0x in practice.
+
+Similarly, #236 (defer query imports, −51ms) was rejected because ingestion takes seconds and queries always need those modules anyway. The maintainer's perspective on what matters was more accurate than our profiler data.
+
+### 2. Don't optimize unmerged third-party code
+
+PR #228 was third-party work with its own issues — Guido later showed with real evals that the proposed min_score tuning made results *worse*. The contributor merged our vector search PR without review ("Noice"). We picked a target based on what was convenient to optimize, not what the project needed.
+
+### 3. Maintainer engagement style matters
+
+Guido's review style is direct and opinionated. He rejected `from __future__ import annotations` on principle (will be deprecated), rejected import-time optimizations as not worth complexity, and expected us to justify why a bottleneck matters before optimizing it. Engaging early with "is this worth doing?" (#229 comment) was the right approach — we should have done that for every PR, not just the first one.
+
+### 4. Code review feedback is where the real value shows
+
+@bmerkle's review of #230 caught a real bug (inverse_actions not indexed) and identified code duplication. Our response to that review — fixing the bug, eliminating duplicated functions — was arguably more valuable than the original optimization.
+
+## Applicable to codeflash
+
+- **N+1 query patterns**: The `get_metadata_multiple` pattern (fetch only needed columns in batch, skip expensive deserialization) applies to any codeflash code that iterates over DB results and deserializes each one
+- **Validate before optimizing**: Build maintainer alignment into the workflow — profile the real workload, confirm the bottleneck matters, then optimize. Don't let impressive micro-benchmark numbers drive target selection
+- **Import deferral has limits**: For libraries where LLM roundtrips dominate, shaving 50ms off import time isn't worth code complexity. Know the latency budget before investing