* Add Java/Kotlin detection to top-level language router Adds pom.xml, build.gradle, build.gradle.kts, settings.gradle, and settings.gradle.kts as markers that route to the codeflash-java router. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add Java/Kotlin agent definitions for all optimization domains 10 agents covering the full optimization pipeline: - codeflash-java: router/team lead for domain detection - codeflash-java-setup: environment detection (build tool, JDK, profiling tools) - codeflash-java-deep: cross-domain optimizer (default) - codeflash-java-cpu: data structures, algorithms, JIT deopt, JMH benchmarks - codeflash-java-memory: heap/GC tuning, escape analysis, leak detection - codeflash-java-async: virtual threads, lock contention, CompletableFuture - codeflash-java-structure: class loading, JPMS, startup time, circular deps - codeflash-java-scan: quick cross-domain diagnosis via JFR/jdeps/GC logs - codeflash-java-ci: GitHub webhook handler for Java PRs - codeflash-java-pr-prep: JMH benchmarks and PR body templates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add Java domain reference guides for all optimization domains 6 guides covering deep domain knowledge for agent consumption: - data-structures: collection selection, autoboxing, JIT patterns, sorting - memory: JVM heap layout, GC algorithms and tuning, escape analysis, leaks - async: virtual threads, structured concurrency, lock hierarchy, contention - structure: class loading, JPMS, CDS/AppCDS, ServiceLoader, Spring startup - database: JPA N+1, HikariCP, pagination, batch operations, EXPLAIN plans - native: JNI, Panama FFM API, GraalVM native-image, Vector API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add Java optimization skills: session launcher and JFR profiling - codeflash-optimize: session launcher with start/resume/status/scan/review - jfr-profiling: quick-action JFR profiling in cpu/alloc/wall modes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Slim Java agents to match Go's concise ~175-line pattern Move inline code examples, antipattern encyclopedias, JMH templates, and deep-dive sections from agent prompts into reference guides. Agents now contain only: target tables, one-liner antipatterns, reasoning checklists, profiling commands, and keep/discard trees. Line counts (before → after): cpu: 636 → 181 memory: 878 → 193 async: 578 → 165 structure: 532 → 167 deep: 507 → 186 scan: 440 → 163 Average: 595 → 176 (vs Go's 175) Adds to data-structures/guide.md: - Collection contract traps table - Reflection → MethodHandle migration pattern - JMH benchmark template Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix Makefile build: use rsync merge and portable sed -i Two bugs in the build target: 1. cp -R created nested dirs (agents/agents/, references/references/) instead of merging language overlay into shared base. Fix: rsync -a. 2. sed -i '' is macOS-only; fails silently on Linux. Fix: sed -i.bak (works on both macOS and Linux), then delete .bak files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add HANDOFF.md session lifecycle to Java agents Java agents could read HANDOFF.md on resume but never wrote or updated it. A session that hit plateau would lose all context — what was tried, what worked, why it stopped, what to do next. Changes: - Deep agent: init HANDOFF.md on fresh start, record after each experiment, write Stop Reason + learnings.md on session end - Domain agents (CPU, memory, async, structure): record to HANDOFF.md after each keep/discard, write session-end state - Handoff template: make language-agnostic (was Python-specific), add Session status, Strategy & Decisions, and Stop Reason fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Close 11 gaps between Java and Python plugins Add missing sections to Java deep agent: experiment loop depth (12 steps), library boundary breaking, Phase 0 environment setup, CI mode, pre-submit review, adversarial review, team orchestration, cross-domain results schema, and structured progress reporting. Add polymorphic dispatch safety to CPU agent and data-structures guide. Add diff hygiene to CPU agent. Add native reference to router. Create two new reference files: library-replacement.md (Guava/Commons/ Jackson/Joda replacement tables) and team-orchestration.md (full dispatch and merge protocol). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
15 KiB
Database Query Optimization for Java
This guide covers JPA/Hibernate performance patterns, connection pooling, query optimization, caching, batch operations, and EXPLAIN plan verification. For general database verification tiers (EXPLAIN comparison, result diffing, generated integration tests), see the shared database reference patterns.
JPA/Hibernate N+1 Problem
The N+1 problem is the most common JPA performance issue. Loading a parent entity and then accessing its lazy-loaded children triggers N additional queries -- one per parent row.
Detection
// BAD: N+1 -- one query per order to load its items
List<Order> orders = entityManager.createQuery(
"SELECT o FROM Order o WHERE o.status = :status", Order.class)
.setParameter("status", "ACTIVE")
.getResultList();
for (Order order : orders) {
order.getItems().size(); // triggers SELECT * FROM order_item WHERE order_id = ?
// One query per order -- if 100 orders, 101 queries total
}
Detection signals:
- Hibernate SQL logging (
spring.jpa.show-sql=trueorhibernate.show_sql=true) shows repeated queries with different parameter values hibernate.generate_statistics=trueshows highprepareStatementcount- P6Spy or datasource-proxy logs show query count per request
Fix 1: JOIN FETCH (JPQL)
// GOOD: single query with JOIN
List<Order> orders = entityManager.createQuery(
"SELECT DISTINCT o FROM Order o JOIN FETCH o.items WHERE o.status = :status", Order.class)
.setParameter("status", "ACTIVE")
.getResultList();
// All items already loaded -- no additional queries
for (Order order : orders) {
order.getItems().size(); // no query -- already fetched
}
Warning: JOIN FETCH with multiple collections causes a cartesian product. Hibernate limits to one JOIN FETCH collection per query (or use @Fetch(FetchMode.SUBSELECT) for the second collection).
Fix 2: @EntityGraph
@Entity
public class Order {
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private List<OrderItem> items;
}
// Define the entity graph
@EntityGraph(attributePaths = {"items", "items.product"})
List<Order> findByStatus(String status);
// Or programmatically
EntityGraph<Order> graph = entityManager.createEntityGraph(Order.class);
graph.addAttributeNodes("items");
Subgraph<OrderItem> itemGraph = graph.addSubgraph("items");
itemGraph.addAttributeNodes("product");
List<Order> orders = entityManager.createQuery("SELECT o FROM Order o", Order.class)
.setHint("javax.persistence.fetchgraph", graph)
.getResultList();
Fix 3: @BatchSize
@Entity
public class Order {
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
@BatchSize(size = 25) // loads items in batches of 25 orders
private List<OrderItem> items;
}
Instead of N queries, Hibernate issues ceil(N/25) queries using WHERE order_id IN (?, ?, ..., ?).
Global batch size
# application.properties (Spring Boot)
spring.jpa.properties.hibernate.default_batch_fetch_size=25
This applies batch fetching to ALL lazy associations globally -- often the single highest-impact Hibernate tuning parameter.
Fix selection guide
| Scenario | Best fix | Why |
|---|---|---|
| Always need children with parent | JOIN FETCH |
Single query, minimal overhead |
| Sometimes need children | @EntityGraph on specific queries |
Selective eager loading per use case |
| Multiple collections on entity | @BatchSize or default_batch_fetch_size |
Avoids cartesian product from multiple JOINs |
| Large result sets | @BatchSize |
JOIN FETCH with pagination is problematic (Hibernate warns about applying in-memory pagination) |
Connection Pooling
HikariCP Configuration
HikariCP is the default connection pool for Spring Boot 2+ and the recommended pool for any Java application.
# Essential settings
spring.datasource.hikari.minimum-idle=5 # min connections kept open (default: same as max)
spring.datasource.hikari.maximum-pool-size=10 # max connections (default: 10)
spring.datasource.hikari.connection-timeout=30000 # ms to wait for connection (default: 30s)
spring.datasource.hikari.max-lifetime=1800000 # max connection age before recycling (default: 30min)
spring.datasource.hikari.idle-timeout=600000 # max idle time before eviction (default: 10min)
spring.datasource.hikari.leak-detection-threshold=60000 # ms -- log warning if connection not returned
Common misconfigurations
| Misconfiguration | Symptom | Fix |
|---|---|---|
maximum-pool-size too small |
SQLTransientConnectionException: Connection not available, request timed out after 30000ms |
Increase pool size. Rule of thumb: pool_size = (core_count * 2) + effective_spindle_count. For SSDs, start at ~10. |
maximum-pool-size too large |
Database overwhelmed with connections, context switching overhead | PostgreSQL: keep total connections (across all app instances) under max_connections. Each idle connection uses ~10 MB of DB memory. |
connection-timeout too short |
Spurious timeouts during traffic spikes | Increase to 30-60s. If timeouts persist, the pool is too small. |
max-lifetime not set or too high |
Connections go stale, database restarts cause errors | Set to 5 minutes less than database's wait_timeout / idle_in_transaction_session_timeout. |
minimum-idle = maximum-pool-size |
Pool never shrinks during idle periods | Set minimum-idle lower to release connections during off-peak. |
No leak-detection-threshold |
Connection leaks go undetected until pool exhaustion | Set to 60000 (60s). Logs a warning with stack trace when a connection isn't returned within the threshold. |
Connection pool sizing formula
The PostgreSQL wiki suggests: pool_size = ((core_count * 2) + effective_spindle_count). For most modern servers with SSDs:
- 4-core machine: 10 connections
- 8-core machine: 20 connections
- More is NOT always better -- beyond the optimal point, context switching and lock contention reduce throughput
Multiple app instances: If you have 4 app instances each with a pool of 10, the database sees 40 connections. Size accordingly.
Query Optimization
JPQL vs Criteria API vs Native SQL
| Approach | Type-safe | Readable | Performance | Use when |
|---|---|---|---|---|
| JPQL | No (string) | High | Good | Simple queries, most use cases |
| Criteria API | Yes | Low (verbose) | Same as JPQL (same query plan) | Dynamic queries with optional filters |
| Native SQL | No | Medium | Best (full DB feature access) | Complex aggregations, CTEs, window functions, DB-specific features |
Pagination: OFFSET vs Keyset
// OFFSET pagination: simple but slow for deep pages
// Page 1000 = database reads and discards 999 * pageSize rows
List<Order> page = entityManager.createQuery(
"SELECT o FROM Order o ORDER BY o.createdAt DESC", Order.class)
.setFirstResult(999 * 20) // skip 19,980 rows
.setMaxResults(20)
.getResultList();
// KEYSET pagination: constant performance regardless of page depth
// Pass the last seen value from the previous page
List<Order> page = entityManager.createQuery(
"SELECT o FROM Order o WHERE o.createdAt < :cursor ORDER BY o.createdAt DESC", Order.class)
.setParameter("cursor", lastSeenCreatedAt)
.setMaxResults(20)
.getResultList();
Rule: Use OFFSET for shallow pages (< 100 pages) or admin UIs. Use keyset pagination for any user-facing infinite scroll, API pagination, or deep result sets.
Projection: DTO vs Entity
// FULL ENTITY: loads all columns, managed by persistence context
List<Order> orders = entityManager.createQuery(
"SELECT o FROM Order o WHERE o.status = :status", Order.class)
.setParameter("status", "ACTIVE")
.getResultList();
// Each Order is tracked for dirty checking, occupies identity map memory
// DTO PROJECTION: loads only needed columns, not managed
List<OrderSummary> summaries = entityManager.createQuery(
"SELECT new com.example.OrderSummary(o.id, o.total, o.createdAt) " +
"FROM Order o WHERE o.status = :status", OrderSummary.class)
.setParameter("status", "ACTIVE")
.getResultList();
// Lightweight: no dirty checking, no identity map, less memory
// TUPLE PROJECTION (Criteria API)
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Tuple> q = cb.createTupleQuery();
Root<Order> root = q.from(Order.class);
q.multiselect(root.get("id"), root.get("total"));
Rule: Use DTO projections for read-only queries (reports, lists, API responses). Use entity loading only when you need to modify the entity or traverse lazy relationships.
Second-Level Cache
Configuration (Ehcache / Caffeine)
# Enable second-level cache
spring.jpa.properties.hibernate.cache.use_second_level_cache=true
spring.jpa.properties.hibernate.cache.region.factory_class=org.hibernate.cache.jcache.JCacheRegionFactory
spring.jpa.properties.hibernate.javax.cache.provider=org.ehcache.jsr107.EhcacheCachingProvider
# Enable query cache (caches JPQL/HQL query results)
spring.jpa.properties.hibernate.cache.use_query_cache=true
Entity cache
@Entity
@Cacheable
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Product {
// Cached after first load. Subsequent findById() returns from cache.
}
Query cache
List<Product> products = entityManager.createQuery(
"SELECT p FROM Product p WHERE p.category = :cat", Product.class)
.setParameter("cat", "electronics")
.setHint("org.hibernate.cacheable", true) // enable query cache for this query
.getResultList();
When to cache vs when to avoid
| Cache type | Use when | Avoid when |
|---|---|---|
| Entity cache | Read-heavy entities updated rarely (products, configuration, reference data) | Frequently updated entities (orders, events, logs) |
| Query cache | Same query with same parameters runs repeatedly | Queries with frequently-changing underlying data (cache invalidated on any table change) |
| Collection cache | @Cache on @OneToMany -- collection accessed repeatedly and rarely modified |
Large collections or frequently-modified collections |
Warning: The query cache is invalidated when ANY entity in the queried table changes. For tables with frequent writes, the cache hit rate drops to near zero and the cache management overhead makes things slower.
Batch Operations
Hibernate batch inserts
# Enable JDBC batching
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
// Batch insert with periodic flush/clear
for (int i = 0; i < 10_000; i++) {
entityManager.persist(new OrderItem(/* ... */));
if (i % 50 == 0) {
entityManager.flush(); // execute batched INSERTs
entityManager.clear(); // detach all entities (free memory)
}
}
JDBC batch inserts (bypass Hibernate)
For maximum insert throughput, bypass Hibernate entirely:
@Autowired
JdbcTemplate jdbcTemplate;
jdbcTemplate.batchUpdate(
"INSERT INTO order_item (order_id, product_id, quantity) VALUES (?, ?, ?)",
items, 1000, // batch size
(ps, item) -> {
ps.setLong(1, item.getOrderId());
ps.setLong(2, item.getProductId());
ps.setInt(3, item.getQuantity());
}
);
Statement ordering
When batch-inserting entities with multiple types, Hibernate may interleave INSERT statements for different tables, breaking JDBC batching. Enable statement ordering:
hibernate.order_inserts=true # group INSERTs by table
hibernate.order_updates=true # group UPDATEs by table
EXPLAIN Plan Verification
Running EXPLAIN from Java
// Spring JdbcTemplate
String plan = jdbcTemplate.queryForObject(
"EXPLAIN (FORMAT JSON) SELECT * FROM orders WHERE status = ? AND created_at > ?",
String.class, "ACTIVE", cutoffDate);
// EntityManager native query
Query q = entityManager.createNativeQuery(
"EXPLAIN (FORMAT TEXT) SELECT * FROM orders WHERE status = ?1 AND created_at > ?2");
q.setParameter(1, "ACTIVE");
q.setParameter(2, cutoffDate);
List<Object[]> plan = q.getResultList();
for (Object[] row : plan) {
System.out.println(row[0]);
}
What to check
| Check | What to look for | Problem if wrong |
|---|---|---|
| Scan type | Index Scan or Index Only Scan on filtered columns |
Seq Scan on large table = missing index |
| Estimated rows | Should match actual row count (use EXPLAIN ANALYZE in dev) |
Stale statistics = wrong query plan. Run ANALYZE table_name. |
| Join type | Nested Loop for small result sets, Hash Join for large |
Nested Loop on large joins = O(n*m) |
| Sort | Index Scan providing order, or Sort node |
Sort on large result set = disk sort possible |
| Bitmap Heap Scan | Filter efficiency -- Rows Removed by Filter should be low |
High Rows Removed = index returns too many false matches |
Common missing indexes
// If you frequently filter by these patterns, verify indexes exist:
// 1. Status + timestamp (range query on created_at with status filter)
@Index(columnList = "status, created_at")
// 2. Foreign keys (JPA does NOT auto-create FK indexes -- unlike Django)
@Index(columnList = "customer_id")
// 3. Composite for multi-column WHERE
@Index(columnList = "tenant_id, status, created_at")
// 4. Partial index (PostgreSQL) for common filter
// Create via native DDL or Flyway migration:
// CREATE INDEX idx_orders_active ON orders (created_at) WHERE status = 'ACTIVE'
Pitfalls
- N+1 is the default: JPA loads associations lazily by default. Every access to an unloaded association triggers a query. Use
default_batch_fetch_sizeas a global safety net. - JOIN FETCH + pagination = in-memory pagination: Hibernate cannot apply LIMIT/OFFSET when JOIN FETCH produces a cartesian product. It loads ALL rows and paginates in memory, logging:
HHH90003004: firstResult/maxResults specified with collection fetch; applying in memory. Use@BatchSizeor DTO projection for paginated queries with eager loading. - Entity loading for read-only queries: Loading full entities for display/API responses wastes memory (identity map tracking, dirty checking) and may trigger lazy-loading cascades. Use DTO projections.
- HikariCP pool exhaustion during long transactions: A transaction holds a connection for its entire duration. Long transactions (batch processing, report generation) exhaust the pool. Move long operations to a separate data source or use streaming (
ScrollableResults). - Query cache with frequent writes: The query cache is table-level -- ANY update to the table invalidates ALL cached queries for that table. For write-heavy tables, the query cache has near-zero hit rate and adds overhead.
- Missing JDBC batching: Without
hibernate.jdbc.batch_size, eachpersist()generates a separateINSERTstatement. For bulk inserts, this is orders of magnitude slower than batched inserts. - Identity generation disables batching:
@GeneratedValue(strategy = GenerationType.IDENTITY)forces Hibernate to execute INSERT immediately (to get the generated ID), defeating JDBC batching. UseSEQUENCEstrategy instead.