chore: add gh-aw duplicate code detector workflow (#2418)
## Summary - Adds the GitHub Agentic Workflows duplicate code detector, configured for Python and TypeScript/JavaScript with Serena semantic analysis - Runs daily, flags patterns spanning 10+ lines or appearing in 3+ locations - Creates up to 3 issues per run with `[duplicate-code]` prefix ## Notes - Requires Claude API secret configured in repo Actions secrets - `code-quality` and `automated-analysis` labels will be auto-created on first run
This commit is contained in:
parent
4c3deeb7b8
commit
9c5ad8fe06
5 changed files with 1450 additions and 0 deletions
2
.gitattributes
vendored
2
.gitattributes
vendored
|
|
@ -3,3 +3,5 @@
|
|||
*.{bat,[bB][aA][tT]} text eol=crlf
|
||||
*.png binary
|
||||
*.jpg binary
|
||||
|
||||
.github/workflows/*.lock.yml linguist-generated=true merge=ours
|
||||
5
.github/aw/imports/.gitattributes
generated
vendored
Normal file
5
.github/aw/imports/.gitattributes
generated
vendored
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
# Mark all cached import files as generated
|
||||
* linguist-generated=true
|
||||
|
||||
# Use 'ours' merge strategy to keep local cached versions
|
||||
* merge=ours
|
||||
73
.github/aw/imports/github/gh-aw/94662b1dee8ce96c876ba9f33b3ab8be32de82a4/.github_workflows_shared_reporting.md
generated
vendored
Normal file
73
.github/aw/imports/github/gh-aw/94662b1dee8ce96c876ba9f33b3ab8be32de82a4/.github_workflows_shared_reporting.md
generated
vendored
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
---
|
||||
# Report formatting guidelines
|
||||
---
|
||||
|
||||
## Report Structure Guidelines
|
||||
|
||||
### 1. Header Levels
|
||||
**Use h3 (###) or lower for all headers in your issue report to maintain proper document hierarchy.**
|
||||
|
||||
When creating GitHub issues or discussions:
|
||||
- Use `###` (h3) for main sections (e.g., "### Test Summary")
|
||||
- Use `####` (h4) for subsections (e.g., "#### Device-Specific Results")
|
||||
- Never use `##` (h2) or `#` (h1) in reports - these are reserved for titles
|
||||
|
||||
### 2. Progressive Disclosure
|
||||
**Wrap detailed test results in `<details><summary><b>Section Name</b></summary>` tags to improve readability and reduce scrolling.**
|
||||
|
||||
Use collapsible sections for:
|
||||
- Verbose details (full test logs, raw data)
|
||||
- Secondary information (minor warnings, extra context)
|
||||
- Per-item breakdowns when there are many items
|
||||
|
||||
Always keep critical information visible (summary, critical issues, key metrics).
|
||||
|
||||
### 3. Report Structure Pattern
|
||||
|
||||
1. **Overview**: 1-2 paragraphs summarizing key findings
|
||||
2. **Critical Information**: Show immediately (summary stats, critical issues)
|
||||
3. **Details**: Use `<details><summary><b>Section Name</b></summary>` for expanded content
|
||||
4. **Context**: Add helpful metadata (workflow run, date, trigger)
|
||||
|
||||
### Design Principles (Airbnb-Inspired)
|
||||
|
||||
Reports should:
|
||||
- **Build trust through clarity**: Most important info immediately visible
|
||||
- **Exceed expectations**: Add helpful context like trends, comparisons
|
||||
- **Create delight**: Use progressive disclosure to reduce overwhelm
|
||||
- **Maintain consistency**: Follow patterns across all reports
|
||||
|
||||
### Example Report Structure
|
||||
|
||||
```markdown
|
||||
### Summary
|
||||
- Key metric 1: value
|
||||
- Key metric 2: value
|
||||
- Status: ✅/⚠️/❌
|
||||
|
||||
### Critical Issues
|
||||
[Always visible - these are important]
|
||||
|
||||
<details>
|
||||
<summary><b>View Detailed Results</b></summary>
|
||||
|
||||
[Comprehensive details, logs, traces]
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>View All Warnings</b></summary>
|
||||
|
||||
[Minor issues and potential problems]
|
||||
|
||||
</details>
|
||||
|
||||
### Recommendations
|
||||
[Actionable next steps - keep visible]
|
||||
```
|
||||
|
||||
## Workflow Run References
|
||||
|
||||
- Format run IDs as links: `[§12345](https://github.com/owner/repo/actions/runs/12345)`
|
||||
- Include up to 3 most relevant run URLs at end under `**References:**`
|
||||
- Do NOT add footer attribution (system adds automatically)
|
||||
1123
.github/workflows/duplicate-code-detector.lock.yml
generated
vendored
Normal file
1123
.github/workflows/duplicate-code-detector.lock.yml
generated
vendored
Normal file
File diff suppressed because it is too large
Load diff
247
.github/workflows/duplicate-code-detector.md
vendored
Normal file
247
.github/workflows/duplicate-code-detector.md
vendored
Normal file
|
|
@ -0,0 +1,247 @@
|
|||
---
|
||||
name: Duplicate Code Detector
|
||||
description: Identifies duplicate code patterns across the codebase and suggests refactoring opportunities
|
||||
on:
|
||||
workflow_dispatch:
|
||||
pull_request:
|
||||
types: [opened, synchronize]
|
||||
permissions:
|
||||
contents: read
|
||||
issues: read
|
||||
pull-requests: read
|
||||
engine: claude
|
||||
tools:
|
||||
serena: ["python", "javascript", "typescript"]
|
||||
safe-outputs:
|
||||
create-issue:
|
||||
expires: 2d
|
||||
title-prefix: "[duplicate-code] "
|
||||
labels: [code-quality, automated-analysis, cookie]
|
||||
assignees: copilot
|
||||
group: true
|
||||
max: 3
|
||||
timeout-minutes: 15
|
||||
strict: true
|
||||
source: github/gh-aw/.github/workflows/duplicate-code-detector.md@94662b1dee8ce96c876ba9f33b3ab8be32de82a4
|
||||
---
|
||||
|
||||
# Duplicate Code Detection
|
||||
|
||||
Analyze code to identify duplicated patterns using Serena's semantic code analysis capabilities. Report significant findings that require refactoring.
|
||||
|
||||
## Task
|
||||
|
||||
Detect and report code duplication by:
|
||||
|
||||
1. **Analyzing Recent Commits**: Review changes in the latest commits
|
||||
2. **Detecting Duplicated Code**: Identify similar or duplicated code patterns using semantic analysis
|
||||
3. **Reporting Findings**: Create a detailed issue if significant duplication is detected (threshold: >10 lines or 3+ similar patterns)
|
||||
|
||||
## Context
|
||||
|
||||
- **Repository**: ${{ github.repository }}
|
||||
- **Commit ID**: ${{ github.event.head_commit.id }}
|
||||
- **Triggered by**: @${{ github.actor }}
|
||||
|
||||
## Analysis Workflow
|
||||
|
||||
### 1. Project Activation
|
||||
|
||||
Activate the project in Serena:
|
||||
- Use `activate_project` tool with workspace path `${{ github.workspace }}` (mounted repository directory)
|
||||
- This sets up the semantic code analysis environment
|
||||
|
||||
### 2. Changed Files Analysis
|
||||
|
||||
Identify and analyze modified files:
|
||||
- Determine files changed in the recent commits
|
||||
- **ONLY analyze .py, .ts, .tsx, .js, and .jsx files** - exclude all other file types
|
||||
- **Exclude JavaScript files except .cjs** from analysis (files matching patterns: `*.js`, `*.mjs`, `*.jsx`, `*.ts`, `*.tsx`)
|
||||
- **Exclude test files** from analysis (files matching patterns: `*_test.go`, `*.test.js`, `*.test.cjs`, `*.spec.js`, `*.spec.cjs`, `*.test.ts`, `*.spec.ts`, `*_test.py`, `test_*.py`, or located in directories named `test`, `tests`, `__tests__`, or `spec`)
|
||||
- **Exclude workflow files** from analysis (files under `.github/workflows/*`)
|
||||
- Use `get_symbols_overview` to understand file structure
|
||||
- Use `read_file` to examine modified file contents
|
||||
|
||||
### 3. Duplicate Detection
|
||||
|
||||
Apply semantic code analysis to find duplicates:
|
||||
|
||||
**Symbol-Level Analysis**:
|
||||
- For significant functions/methods in changed files, use `find_symbol` to search for similarly named symbols
|
||||
- Use `find_referencing_symbols` to understand usage patterns
|
||||
- Identify functions with similar names in different files (e.g., `processData` across modules)
|
||||
|
||||
**Pattern Search**:
|
||||
- Use `search_for_pattern` to find similar code patterns
|
||||
- Search for duplication indicators:
|
||||
- Similar function signatures
|
||||
- Repeated logic blocks
|
||||
- Similar variable naming patterns
|
||||
- Near-identical code blocks
|
||||
|
||||
**Structural Analysis**:
|
||||
- Use `list_dir` and `find_file` to identify files with similar names or purposes
|
||||
- Compare symbol overviews across files for structural similarities
|
||||
|
||||
### 4. Duplication Evaluation
|
||||
|
||||
Assess findings to identify true code duplication:
|
||||
|
||||
**Duplication Types**:
|
||||
- **Exact Duplication**: Identical code blocks in multiple locations
|
||||
- **Structural Duplication**: Same logic with minor variations (different variable names, etc.)
|
||||
- **Functional Duplication**: Different implementations of the same functionality
|
||||
- **Copy-Paste Programming**: Similar code blocks that could be extracted into shared utilities
|
||||
|
||||
**Assessment Criteria**:
|
||||
- **Severity**: Amount of duplicated code (lines of code, number of occurrences)
|
||||
- **Impact**: Where duplication occurs (critical paths, frequently called code)
|
||||
- **Maintainability**: How duplication affects code maintainability
|
||||
- **Refactoring Opportunity**: Whether duplication can be easily refactored
|
||||
|
||||
### 5. Issue Reporting
|
||||
|
||||
Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run). Each pattern should get its own issue to enable focused remediation.
|
||||
|
||||
**When to Create Issues**:
|
||||
- Only create issues if significant duplication is found (threshold: >10 lines of duplicated code OR 3+ instances of similar patterns)
|
||||
- **Create one issue per distinct pattern** - do NOT bundle multiple patterns in a single issue
|
||||
- Limit to the top 3 most significant patterns if more are found
|
||||
- Use the `create_issue` tool from safe-outputs MCP **once for each pattern**
|
||||
|
||||
**Issue Contents for Each Pattern**:
|
||||
- **Executive Summary**: Brief description of this specific duplication pattern
|
||||
- **Duplication Details**: Specific locations and code blocks for this pattern only
|
||||
- **Severity Assessment**: Impact and maintainability concerns for this pattern
|
||||
- **Refactoring Recommendations**: Suggested approaches to eliminate this pattern
|
||||
- **Code Examples**: Concrete examples with file paths and line numbers for this pattern
|
||||
|
||||
## Detection Scope
|
||||
|
||||
### Report These Issues
|
||||
|
||||
- Identical or nearly identical functions in different files
|
||||
- Repeated code blocks that could be extracted to utilities
|
||||
- Similar classes or modules with overlapping functionality
|
||||
- Copy-pasted code with minor modifications
|
||||
- Duplicated business logic across components
|
||||
|
||||
### Skip These Patterns
|
||||
|
||||
- Standard boilerplate code (imports, exports, etc.)
|
||||
- Test setup/teardown code (acceptable duplication in tests)
|
||||
- **JavaScript files except .cjs** (files matching: `*.js`, `*.mjs`, `*.jsx`, `*.ts`, `*.tsx`)
|
||||
- **All test files** (files matching: `*_test.go`, `*.test.js`, `*.test.cjs`, `*.spec.js`, `*.spec.cjs`, `*.test.ts`, `*.spec.ts`, `*_test.py`, `test_*.py`, or in `test/`, `tests/`, `__tests__/`, `spec/` directories)
|
||||
- **All workflow files** (files under `.github/workflows/*`)
|
||||
- Configuration files with similar structure
|
||||
- Language-specific patterns (constructors, getters/setters)
|
||||
- Small code snippets (<5 lines) unless highly repetitive
|
||||
|
||||
### Analysis Depth
|
||||
|
||||
- **File Type Restriction**: ONLY analyze .py, .ts, .tsx, .js, and .jsx files - ignore all other file types
|
||||
- **Primary Focus**: All .py, .ts, .tsx, .js, and .jsx files changed in the current push (excluding test files and workflow files)
|
||||
- **Secondary Analysis**: Check for duplication with existing Python and TypeScript/JavaScript codebase (excluding test files and workflow files)
|
||||
- **Cross-Reference**: Look for patterns across .py, .ts, .tsx, .js, and .jsx files in the repository
|
||||
- **Historical Context**: Consider if duplication is new or existing
|
||||
|
||||
## Issue Template
|
||||
|
||||
For each distinct duplication pattern found, create a separate issue using this structure:
|
||||
|
||||
```markdown
|
||||
# 🔍 Duplicate Code Detected: [Pattern Name]
|
||||
|
||||
*Analysis of commit ${{ github.event.head_commit.id }}*
|
||||
|
||||
**Assignee**: @copilot
|
||||
|
||||
## Summary
|
||||
|
||||
[Brief overview of this specific duplication pattern]
|
||||
|
||||
## Duplication Details
|
||||
|
||||
### Pattern: [Description]
|
||||
- **Severity**: High/Medium/Low
|
||||
- **Occurrences**: [Number of instances]
|
||||
- **Locations**:
|
||||
- `path/to/file1.ext` (lines X-Y)
|
||||
- `path/to/file2.ext` (lines A-B)
|
||||
- **Code Sample**:
|
||||
```[language]
|
||||
[Example of duplicated code]
|
||||
```
|
||||
|
||||
## Impact Analysis
|
||||
|
||||
- **Maintainability**: [How this affects code maintenance]
|
||||
- **Bug Risk**: [Potential for inconsistent fixes]
|
||||
- **Code Bloat**: [Impact on codebase size]
|
||||
|
||||
## Refactoring Recommendations
|
||||
|
||||
1. **[Recommendation 1]**
|
||||
- Extract common functionality to: `suggested/path/utility.ext`
|
||||
- Estimated effort: [hours/complexity]
|
||||
- Benefits: [specific improvements]
|
||||
|
||||
2. **[Recommendation 2]**
|
||||
[... additional recommendations ...]
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Review duplication findings
|
||||
- [ ] Prioritize refactoring tasks
|
||||
- [ ] Create refactoring plan
|
||||
- [ ] Implement changes
|
||||
- [ ] Update tests
|
||||
- [ ] Verify no functionality broken
|
||||
|
||||
## Analysis Metadata
|
||||
|
||||
- **Analyzed Files**: [count]
|
||||
- **Detection Method**: Serena semantic code analysis
|
||||
- **Commit**: ${{ github.event.head_commit.id }}
|
||||
- **Analysis Date**: [timestamp]
|
||||
```
|
||||
|
||||
## Operational Guidelines
|
||||
|
||||
### Security
|
||||
- Never execute untrusted code or commands
|
||||
- Only use Serena's read-only analysis tools
|
||||
- Do not modify files during analysis
|
||||
|
||||
### Efficiency
|
||||
- Focus on recently changed files first
|
||||
- Use semantic analysis for meaningful duplication, not superficial matches
|
||||
- Stay within timeout limits (balance thoroughness with execution time)
|
||||
|
||||
### Accuracy
|
||||
- Verify findings before reporting
|
||||
- Distinguish between acceptable patterns and true duplication
|
||||
- Consider language-specific idioms and best practices
|
||||
- Provide specific, actionable recommendations
|
||||
|
||||
### Issue Creation
|
||||
- Create **one issue per distinct duplication pattern** - do NOT bundle multiple patterns in a single issue
|
||||
- Limit to the top 3 most significant patterns if more are found
|
||||
- Only create issues if significant duplication is found
|
||||
- Include sufficient detail for SWE agents to understand and act on findings
|
||||
- Provide concrete examples with file paths and line numbers
|
||||
- Suggest practical refactoring approaches
|
||||
- Assign issue to @copilot for automated remediation
|
||||
- Use descriptive titles that clearly identify the specific pattern (e.g., "Duplicate Code: Error Handling Pattern in Parser Module")
|
||||
|
||||
## Tool Usage Sequence
|
||||
|
||||
1. **Project Setup**: `activate_project` with repository path
|
||||
2. **File Discovery**: `list_dir`, `find_file` for changed files
|
||||
3. **Symbol Analysis**: `get_symbols_overview` for structure understanding
|
||||
4. **Content Review**: `read_file` for detailed code examination
|
||||
5. **Pattern Matching**: `search_for_pattern` for similar code
|
||||
6. **Symbol Search**: `find_symbol` for duplicate function names
|
||||
7. **Reference Analysis**: `find_referencing_symbols` for usage patterns
|
||||
|
||||
**Objective**: Improve code quality by identifying and reporting meaningful code duplication that impacts maintainability. Focus on actionable findings that enable automated or manual refactoring.
|
||||
Loading…
Reference in a new issue