mirror of
https://github.com/codeflash-ai/codeflash-agent.git
synced 2026-05-04 18:25:19 +00:00
Add typeagent-py case study (#17)
- Add case-studies/microsoft/typeagent/summary.md with results, lessons learned (failed vector search experiment, maintainer alignment), and takeaways for codeflash - Update upstream PR statuses: #235 merged, #236 closed (rejected), #232 blocked on #230 - Add typeagent to main README results table
This commit is contained in:
parent
6dd3b02168
commit
09ba9b44b2
3 changed files with 88 additions and 4 deletions
|
|
@ -138,11 +138,11 @@ Stacked PRs — merge in order (#231 first, each subsequent PR builds on the pre
|
|||
|---|---|---|---|
|
||||
| 1 | [microsoft/typeagent-py#231](https://github.com/microsoft/typeagent-py/pull/231) | **Merged** | Fix parse_azure_endpoint passing query string to AsyncAzureOpenAI |
|
||||
| 2 | [microsoft/typeagent-py#229](https://github.com/microsoft/typeagent-py/pull/229) | **Merged** | Defer black import to first use |
|
||||
| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | Open | Batch SQLite INSERTs for indexing pipeline |
|
||||
| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Open | Batch metadata query to avoid N+1 across 5 call sites |
|
||||
| 3 | [microsoft/typeagent-py#230](https://github.com/microsoft/typeagent-py/pull/230) | In review | Batch SQLite INSERTs for indexing pipeline (reviewer: @bmerkle, comments addressed) |
|
||||
| 4 | [microsoft/typeagent-py#232](https://github.com/microsoft/typeagent-py/pull/232) | Blocked on #230 | Batch metadata query to avoid N+1 across 5 call sites (reviewer: @bmerkle, 4 comments — will rebase after #230 merges) |
|
||||
| 5 | [microsoft/typeagent-py#234](https://github.com/microsoft/typeagent-py/pull/234) | **Merged** | Vectorize fuzzy_lookup_embedding with numpy ops (see [Lessons Learned](#lessons-learned)) |
|
||||
| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | Open | Replace black with stdlib pprint for runtime formatting |
|
||||
| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | Open | Defer query-time imports in conversation_base |
|
||||
| 6 | [microsoft/typeagent-py#235](https://github.com/microsoft/typeagent-py/pull/235) | **Merged** | Replace black with stdlib pprint for runtime formatting |
|
||||
| 7 | [microsoft/typeagent-py#236](https://github.com/microsoft/typeagent-py/pull/236) | **Closed** | Defer query-time imports in conversation_base (rejected — maintainer said savings not worth complexity) |
|
||||
| — | [shreejaykurhade/typeagent-py#1](https://github.com/shreejaykurhade/typeagent-py/pull/1) | **Merged** | Numpy vectorized fuzzy lookup (against PR #228's fork — see [Lessons Learned](#lessons-learned)) |
|
||||
|
||||
### Fork PRs
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ Autonomous performance optimization platform. Profiles code, implements optimiza
|
|||
|---|---|---|
|
||||
| Rich | 2x Console import (79ms → 34ms) | [summary](case-studies/textualize/rich/summary.md) |
|
||||
| pip | 7x `--version` (138ms → 20ms), 1.81x resolver | [summary](case-studies/pypa/pip/summary.md) |
|
||||
| typeagent-py | 2.6x query path, 1.16x import + indexing | [summary](case-studies/microsoft/typeagent/summary.md) |
|
||||
|
||||
## Domains
|
||||
|
||||
|
|
|
|||
83
case-studies/microsoft/typeagent/summary.md
Normal file
83
case-studies/microsoft/typeagent/summary.md
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
# typeagent-py Optimization — Lessons Learned
|
||||
|
||||
Full case study: [.codeflash/krrt7/microsoft/typeagent/](../../../.codeflash/krrt7/microsoft/typeagent/)
|
||||
|
||||
## Context
|
||||
|
||||
[typeagent-py](https://github.com/microsoft/typeagent-py) is Microsoft's Python library for structured RAG — ingesting conversations, building semantic indexes, and querying them with LLM-backed answer generation. Maintained by Guido van Rossum.
|
||||
|
||||
We targeted three areas: import time (heavy eager imports), indexing pipeline (N+1 SQLite inserts), and query path (N+1 metadata fetches with unnecessary deserialization).
|
||||
|
||||
## What we did (by impact)
|
||||
|
||||
### Query batching (2.10x–2.62x)
|
||||
|
||||
Five call sites used `get_item()` per scored ref — one SELECT and full deserialization per match (N+1 pattern). Profiling showed 64% of per-row cost was deserializing `knowledge_json`, which the filters never use.
|
||||
|
||||
Added `get_metadata_multiple` that fetches only `semref_id, range_json, knowledge_type` in a single batch query. Further optimized scope-filtering with binary search in `TextRangeCollection.contains_range` and inline tuple comparisons in `TextRange`.
|
||||
|
||||
### Import time (1.16x — 791ms to 683ms)
|
||||
|
||||
1. **Deferred `black` import**: `black` was imported at module level but only used in cold formatting paths. Moved inside functions (−78ms)
|
||||
2. **Replaced `black` with stdlib `pprint`**: Removed `black` from runtime dependencies entirely
|
||||
3. **Deferred query-time imports**: Moved 4 query-only modules behind `TYPE_CHECKING` guards (−51ms) — **rejected by maintainer** as not worth the complexity given LLM roundtrip dominates
|
||||
|
||||
### Indexing pipeline (1.14x–1.16x)
|
||||
|
||||
Batched SQLite INSERTs via `executemany` instead of individual `cursor.execute()` per term/property. Reduced 1000+ INSERT calls to 2–3 for 200 messages.
|
||||
|
||||
### Bugfix
|
||||
|
||||
Fixed `parse_azure_endpoint` passing query string to `AsyncAzureOpenAI`, producing mangled URLs. Added 6 unit tests.
|
||||
|
||||
## Results
|
||||
|
||||
| Category | Benchmark | Before | After | Speedup |
|
||||
|---|---|---:|---:|---:|
|
||||
| Import | `import typeagent` | 791 ms | 683 ms | **1.16x** |
|
||||
| Query | `lookup_term_filtered` (200 matches) | 2.652 ms | 1.260 ms | **2.10x** |
|
||||
| Query | `group_matches_by_type` (200 matches) | 2.453 ms | 992 us | **2.47x** |
|
||||
| Query | `lookup_property_in_property_index` (200 matches) | 24.5 ms | 9.4 ms | **2.61x** |
|
||||
| Query | `get_matches_in_scope` (200 matches) | 24.1 ms | 9.2 ms | **2.62x** |
|
||||
| Indexing | `add_messages_with_indexing` (200 msgs) | 28.8 ms | 25.0 ms | **1.16x** |
|
||||
| E2E | 69 offline tests | 5.72 s | 5.60 s | **1.02x** |
|
||||
|
||||
## Upstream status
|
||||
|
||||
| PR | Status | Description |
|
||||
|---|---|---|
|
||||
| #231 | **Merged** | Fix parse_azure_endpoint |
|
||||
| #229 | **Merged** | Defer black import |
|
||||
| #234 | **Merged** | Vectorize fuzzy_lookup_embedding (see failure below) |
|
||||
| #235 | **Merged** | Replace black with stdlib pprint |
|
||||
| #236 | **Closed** | Defer query imports (rejected — not worth complexity) |
|
||||
| #230 | In review | Batch SQLite INSERTs |
|
||||
| #232 | Blocked on #230 | Batch metadata query |
|
||||
|
||||
4 of 7 upstream PRs merged, 1 rejected, 2 in review.
|
||||
|
||||
## Key takeaways
|
||||
|
||||
### 1. Validate bottlenecks against real workloads, not synthetic benchmarks
|
||||
|
||||
We optimized vector search hot paths in a community contributor's PR (#228), achieving 3.7x–14.2x speedups on micro-benchmarks. Guido pointed out that model API calls and SQL queries dominate real-world latency — the numpy iteration time is noise. A 14x speedup on a non-bottleneck is worth 0x in practice.
|
||||
|
||||
Similarly, #236 (defer query imports, −51ms) was rejected because ingestion takes seconds and queries always need those modules anyway. The maintainer's perspective on what matters was more accurate than our profiler data.
|
||||
|
||||
### 2. Don't optimize unmerged third-party code
|
||||
|
||||
PR #228 was third-party work with its own issues — Guido later showed with real evals that the proposed min_score tuning made results *worse*. The contributor merged our vector search PR without review ("Noice"). We picked a target based on what was convenient to optimize, not what the project needed.
|
||||
|
||||
### 3. Maintainer engagement style matters
|
||||
|
||||
Guido's review style is direct and opinionated. He rejected `from __future__ import annotations` on principle (will be deprecated), rejected import-time optimizations as not worth complexity, and expected us to justify why a bottleneck matters before optimizing it. Engaging early with "is this worth doing?" (#229 comment) was the right approach — we should have done that for every PR, not just the first one.
|
||||
|
||||
### 4. Code review feedback is where the real value shows
|
||||
|
||||
@bmerkle's review of #230 caught a real bug (inverse_actions not indexed) and identified code duplication. Our response to that review — fixing the bug, eliminating duplicated functions — was arguably more valuable than the original optimization.
|
||||
|
||||
## Applicable to codeflash
|
||||
|
||||
- **N+1 query patterns**: The `get_metadata_multiple` pattern (fetch only needed columns in batch, skip expensive deserialization) applies to any codeflash code that iterates over DB results and deserializes each one
|
||||
- **Validate before optimizing**: Build maintainer alignment into the workflow — profile the real workload, confirm the bottleneck matters, then optimize. Don't let impressive micro-benchmark numbers drive target selection
|
||||
- **Import deferral has limits**: For libraries where LLM roundtrips dominate, shaving 50ms off import time isn't worth code complexity. Know the latency budget before investing
|
||||
Loading…
Reference in a new issue