mirror of
https://github.com/codeflash-ai/codeflash-internal.git
synced 2026-05-04 18:25:18 +00:00
No description
The optimization adds an early-exit check in `calculate_llm_cost` that returns zero immediately when all rate fields (`input_cost`, `cached_input_cost`, `output_cost`) are zero, before extracting token counts via `getattr` calls. Line profiler confirms the hot path: the original spent 70.7% of function time (580 ms) in the final return statement's arithmetic, but 99.3% of calls (949/956) had zero-cost models where token extraction was wasted work. The optimized version short-circuits these cases in 1.9 ms total, cutting `calculate_llm_cost` from 821 ms to 29 ms (96.5% reduction). This cascades to `LLMClient.call`, where cost calculation dropped from 50.5% to 4.3% of method time, yielding an 80% throughput gain (6165 → 11,097 ops/sec) despite a 37% concurrency ratio regression caused by spending proportionally more time in non-yielding sync code after eliminating the async bottleneck. |
||
|---|---|---|
| .claude | ||
| .codex | ||
| .gemini | ||
| .github | ||
| .idea | ||
| .tessl | ||
| .vscode | ||
| cli/code-to-optimize | ||
| deployment/onprem-simple | ||
| django | ||
| experiments | ||
| js | ||
| tiles | ||
| .dockerignore | ||
| .editorconfig | ||
| .gitattributes | ||
| .gitignore | ||
| .gitmodules | ||
| .mcp.json | ||
| .pre-commit-config.yaml | ||
| .prettierrc | ||
| AGENTS.md | ||
| CLAUDE.md | ||
| lefthook.yml | ||
| mypy.ini | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| secretlint.config.js | ||
| tessl.json | ||
CodeFlash MonoRepo
Here's the projects that are part of the CodeFlash MonoRepo:
- CodeFlash Client - /cli/
- CodeFlash Python Django ai service - /django/aiservice
- CodeFlash NodeJS CF API - /js/cf-api
- CodeFlash Webapp - /js/cf-webapp
Project Setup
Prerequisites
- Node.js and npm: Ensure Node.js is installed and npm is set up for installation of pre-commit hook(Lefthook).
- Python and Mamba: Ensure Python is installed and Mamba is set up.
post clone run npm install to install all the dependencies at root level.
Glossary
Optimization
- Codeflash Optimizer - The overarching technology that solves Code optimization.
- Function to Optimize - The target function that we want to optimize.
- Optimization Candidate - generated code that we think might be an optimization of the code to optimize.
- Helper function - This a function being called by, and is under the code path of the function to optimize.
- Read-Write Context - The part of the code context provided to the LLM that it can modify. Aka - Code To Optimize
- Read-Only Context - The part of the context that is only provided as info to the LLM. It is not expected to be modified.
Test generation
- Verification - System to verify if the optimization candidate has the same functional behavior as the function to optimize.
- Existing Tests - All the existing tests that are present in a repo.
- Generated Test - The tests that we create for the user using the LLM.
- Tracer - Our technology that collects and dumps the input arguments and other info for a Python executable.
- Replay test - This test reruns all the inputs for a function to optimize that were collected by the tracer.
- Inspired Regression tests - Newly generated Tests that were "inspired" by existing tests. That means these are new test cases that are generated by the llm understanding how the code works by looking at the existing test cases and function to optimize.
- Comparator - Our function that compares any two Python objects and returns True if they are equal and False if they are not equal.
Infra and Systems
- CF API - The javascript webservice that currently serves the GitHub App.
- AI Service - The Python Django service that serves the AI endpoints.
- Webapp - The react web application written in Next.js. Users can generate API Key etc here.
- PostHog - Our events tracking and product analytics 3rd party tool.
- Sentry - Our code crash telemetry service that helps us understand how codeflash fails.