Add typeagent-py case study (#17)

- Add case-studies/microsoft/typeagent/summary.md with results, lessons
  learned (failed vector search experiment, maintainer alignment), and
  takeaways for codeflash
- Update upstream PR statuses: #235 merged, #236 closed (rejected),
  #232 blocked on #230
- Add typeagent to main README results table
This commit is contained in:
Kevin Turcios 2026-04-14 23:25:29 -05:00 committed by GitHub
parent 6dd3b02168
commit 09ba9b44b2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 88 additions and 4 deletions

View file

@ -138,11 +138,11 @@ Stacked PRs — merge in order (#231 first, each subsequent PR builds on the pre
|---|---|---|---|
| 1 | [microsoft/typeagent-py#231](https://github.com/microsoft/typeagent-py/pull/231) | **Merged** | Fix parse_azure_endpoint passing query string to AsyncAzureOpenAI |
| 2 | [microsoft/typeagent-py#229](https://github.com/microsoft/typeagent-py/pull/229) | **Merged** | Defer black import to first use |
| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | Open | Batch SQLite INSERTs for indexing pipeline |
| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Open | Batch metadata query to avoid N+1 across 5 call sites |
| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | In review | Batch SQLite INSERTs for indexing pipeline (reviewer: @bmerkle, comments addressed) |
| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Blocked on #230 | Batch metadata query to avoid N+1 across 5 call sites (reviewer: @bmerkle, 4 comments — will rebase after #230 merges) |
| 5 | [microsoft/typeagent-py#234](https://github.com/microsoft/typeagent-py/pull/234) | **Merged** | Vectorize fuzzy_lookup_embedding with numpy ops (see [Lessons Learned](#lessons-learned)) |
| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | Open | Replace black with stdlib pprint for runtime formatting |
| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | Open | Defer query-time imports in conversation_base |
| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | **Merged** | Replace black with stdlib pprint for runtime formatting |
| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | **Closed** | Defer query-time imports in conversation_base (rejected — maintainer said savings not worth complexity) |
| — | [shreejaykurhade/typeagent-py#1](https://github.com/shreejaykurhade/typeagent-py/pull/1) | **Merged** | Numpy vectorized fuzzy lookup (against PR #228's fork — see [Lessons Learned](#lessons-learned)) |
### Fork PRs

View file

@ -8,6 +8,7 @@ Autonomous performance optimization platform. Profiles code, implements optimiza
|---|---|---|
| Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) |
| pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/pypa/pip/summary.md) |
| typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) |
## Domains

View file

@ -0,0 +1,83 @@
# typeagent-py Optimization — Lessons Learned
Full case study: [.codeflash/krrt7/microsoft/typeagent/](../../../.codeflash/krrt7/microsoft/typeagent/)
## Context
[typeagent-py](https://github.com/microsoft/typeagent-py) is Microsoft's Python library for structured RAG — ingesting conversations, building semantic indexes, and querying them with LLM-backed answer generation. Maintained by Guido van Rossum.
We targeted three areas: import time (heavy eager imports), indexing pipeline (N+1 SQLite inserts), and query path (N+1 metadata fetches with unnecessary deserialization).
## What we did (by impact)
### Query batching (2.10x2.62x)
Five call sites used `get_item()` per scored ref — one SELECT and full deserialization per match (N+1 pattern). Profiling showed 64% of per-row cost was deserializing `knowledge_json`, which the filters never use.
Added `get_metadata_multiple` that fetches only `semref_id, range_json, knowledge_type` in a single batch query. Further optimized scope-filtering with binary search in `TextRangeCollection.contains_range` and inline tuple comparisons in `TextRange`.
### Import time (1.16x — 791ms to 683ms)
1. **Deferred `black` import**: `black` was imported at module level but only used in cold formatting paths. Moved inside functions (78ms)
2. **Replaced `black` with stdlib `pprint`**: Removed `black` from runtime dependencies entirely
3. **Deferred query-time imports**: Moved 4 query-only modules behind `TYPE_CHECKING` guards (51ms) — **rejected by maintainer** as not worth the complexity given LLM roundtrip dominates
### Indexing pipeline (1.14x1.16x)
Batched SQLite INSERTs via `executemany` instead of individual `cursor.execute()` per term/property. Reduced 1000+ INSERT calls to 23 for 200 messages.
### Bugfix
Fixed `parse_azure_endpoint` passing query string to `AsyncAzureOpenAI`, producing mangled URLs. Added 6 unit tests.
## Results
| Category | Benchmark | Before | After | Speedup |
|---|---|---:|---:|---:|
| Import | `import typeagent` | 791 ms | 683 ms | **1.16x** |
| Query | `lookup_term_filtered` (200 matches) | 2.652 ms | 1.260 ms | **2.10x** |
| Query | `group_matches_by_type` (200 matches) | 2.453 ms | 992 us | **2.47x** |
| Query | `lookup_property_in_property_index` (200 matches) | 24.5 ms | 9.4 ms | **2.61x** |
| Query | `get_matches_in_scope` (200 matches) | 24.1 ms | 9.2 ms | **2.62x** |
| Indexing | `add_messages_with_indexing` (200 msgs) | 28.8 ms | 25.0 ms | **1.16x** |
| E2E | 69 offline tests | 5.72 s | 5.60 s | **1.02x** |
## Upstream status
| PR | Status | Description |
|---|---|---|
| #231 | **Merged** | Fix parse_azure_endpoint |
| #229 | **Merged** | Defer black import |
| #234 | **Merged** | Vectorize fuzzy_lookup_embedding (see failure below) |
| #235 | **Merged** | Replace black with stdlib pprint |
| #236 | **Closed** | Defer query imports (rejected — not worth complexity) |
| #230 | In review | Batch SQLite INSERTs |
| #232 | Blocked on #230 | Batch metadata query |
4 of 7 upstream PRs merged, 1 rejected, 2 in review.
## Key takeaways
### 1. Validate bottlenecks against real workloads, not synthetic benchmarks
We optimized vector search hot paths in a community contributor's PR (#228), achieving 3.7x14.2x speedups on micro-benchmarks. Guido pointed out that model API calls and SQL queries dominate real-world latency — the numpy iteration time is noise. A 14x speedup on a non-bottleneck is worth 0x in practice.
Similarly, #236 (defer query imports, 51ms) was rejected because ingestion takes seconds and queries always need those modules anyway. The maintainer's perspective on what matters was more accurate than our profiler data.
### 2. Don't optimize unmerged third-party code
PR #228 was third-party work with its own issues — Guido later showed with real evals that the proposed min_score tuning made results *worse*. The contributor merged our vector search PR without review ("Noice"). We picked a target based on what was convenient to optimize, not what the project needed.
### 3. Maintainer engagement style matters
Guido's review style is direct and opinionated. He rejected `from __future__ import annotations` on principle (will be deprecated), rejected import-time optimizations as not worth complexity, and expected us to justify why a bottleneck matters before optimizing it. Engaging early with "is this worth doing?" (#229 comment) was the right approach — we should have done that for every PR, not just the first one.
### 4. Code review feedback is where the real value shows
@bmerkle's review of #230 caught a real bug (inverse_actions not indexed) and identified code duplication. Our response to that review — fixing the bug, eliminating duplicated functions — was arguably more valuable than the original optimization.
## Applicable to codeflash
- **N+1 query patterns**: The `get_metadata_multiple` pattern (fetch only needed columns in batch, skip expensive deserialization) applies to any codeflash code that iterates over DB results and deserializes each one
- **Validate before optimizing**: Build maintainer alignment into the workflow — profile the real workload, confirm the bottleneck matters, then optimize. Don't let impressive micro-benchmark numbers drive target selection
- **Import deferral has limits**: For libraries where LLM roundtrips dominate, shaving 50ms off import time isn't worth code complexity. Know the latency budget before investing