## Summary - Adds the GitHub Agentic Workflows duplicate code detector, configured for Python and TypeScript/JavaScript with Serena semantic analysis - Runs daily, flags patterns spanning 10+ lines or appearing in 3+ locations - Creates up to 3 issues per run with `[duplicate-code]` prefix ## Notes - Requires Claude API secret configured in repo Actions secrets - `code-quality` and `automated-analysis` labels will be auto-created on first run
9.7 KiB
| name | description | on | permissions | engine | tools | safe-outputs | timeout-minutes | strict | source | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Duplicate Code Detector | Identifies duplicate code patterns across the codebase and suggests refactoring opportunities |
|
|
claude |
|
|
15 | true | github/gh-aw/.github/workflows/duplicate-code-detector.md@94662b1dee8ce96c876ba9f33b3ab8be32de82a4 |
Duplicate Code Detection
Analyze code to identify duplicated patterns using Serena's semantic code analysis capabilities. Report significant findings that require refactoring.
Task
Detect and report code duplication by:
- Analyzing Recent Commits: Review changes in the latest commits
- Detecting Duplicated Code: Identify similar or duplicated code patterns using semantic analysis
- Reporting Findings: Create a detailed issue if significant duplication is detected (threshold: >10 lines or 3+ similar patterns)
Context
- Repository: ${{ github.repository }}
- Commit ID: ${{ github.event.head_commit.id }}
- Triggered by: @${{ github.actor }}
Analysis Workflow
1. Project Activation
Activate the project in Serena:
- Use
activate_projecttool with workspace path${{ github.workspace }}(mounted repository directory) - This sets up the semantic code analysis environment
2. Changed Files Analysis
Identify and analyze modified files:
- Determine files changed in the recent commits
- ONLY analyze .py, .ts, .tsx, .js, and .jsx files - exclude all other file types
- Exclude JavaScript files except .cjs from analysis (files matching patterns:
*.js,*.mjs,*.jsx,*.ts,*.tsx) - Exclude test files from analysis (files matching patterns:
*_test.go,*.test.js,*.test.cjs,*.spec.js,*.spec.cjs,*.test.ts,*.spec.ts,*_test.py,test_*.py, or located in directories namedtest,tests,__tests__, orspec) - Exclude workflow files from analysis (files under
.github/workflows/*) - Use
get_symbols_overviewto understand file structure - Use
read_fileto examine modified file contents
3. Duplicate Detection
Apply semantic code analysis to find duplicates:
Symbol-Level Analysis:
- For significant functions/methods in changed files, use
find_symbolto search for similarly named symbols - Use
find_referencing_symbolsto understand usage patterns - Identify functions with similar names in different files (e.g.,
processDataacross modules)
Pattern Search:
- Use
search_for_patternto find similar code patterns - Search for duplication indicators:
- Similar function signatures
- Repeated logic blocks
- Similar variable naming patterns
- Near-identical code blocks
Structural Analysis:
- Use
list_dirandfind_fileto identify files with similar names or purposes - Compare symbol overviews across files for structural similarities
4. Duplication Evaluation
Assess findings to identify true code duplication:
Duplication Types:
- Exact Duplication: Identical code blocks in multiple locations
- Structural Duplication: Same logic with minor variations (different variable names, etc.)
- Functional Duplication: Different implementations of the same functionality
- Copy-Paste Programming: Similar code blocks that could be extracted into shared utilities
Assessment Criteria:
- Severity: Amount of duplicated code (lines of code, number of occurrences)
- Impact: Where duplication occurs (critical paths, frequently called code)
- Maintainability: How duplication affects code maintainability
- Refactoring Opportunity: Whether duplication can be easily refactored
5. Issue Reporting
Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run). Each pattern should get its own issue to enable focused remediation.
When to Create Issues:
- Only create issues if significant duplication is found (threshold: >10 lines of duplicated code OR 3+ instances of similar patterns)
- Create one issue per distinct pattern - do NOT bundle multiple patterns in a single issue
- Limit to the top 3 most significant patterns if more are found
- Use the
create_issuetool from safe-outputs MCP once for each pattern
Issue Contents for Each Pattern:
- Executive Summary: Brief description of this specific duplication pattern
- Duplication Details: Specific locations and code blocks for this pattern only
- Severity Assessment: Impact and maintainability concerns for this pattern
- Refactoring Recommendations: Suggested approaches to eliminate this pattern
- Code Examples: Concrete examples with file paths and line numbers for this pattern
Detection Scope
Report These Issues
- Identical or nearly identical functions in different files
- Repeated code blocks that could be extracted to utilities
- Similar classes or modules with overlapping functionality
- Copy-pasted code with minor modifications
- Duplicated business logic across components
Skip These Patterns
- Standard boilerplate code (imports, exports, etc.)
- Test setup/teardown code (acceptable duplication in tests)
- JavaScript files except .cjs (files matching:
*.js,*.mjs,*.jsx,*.ts,*.tsx) - All test files (files matching:
*_test.go,*.test.js,*.test.cjs,*.spec.js,*.spec.cjs,*.test.ts,*.spec.ts,*_test.py,test_*.py, or intest/,tests/,__tests__/,spec/directories) - All workflow files (files under
.github/workflows/*) - Configuration files with similar structure
- Language-specific patterns (constructors, getters/setters)
- Small code snippets (<5 lines) unless highly repetitive
Analysis Depth
- File Type Restriction: ONLY analyze .py, .ts, .tsx, .js, and .jsx files - ignore all other file types
- Primary Focus: All .py, .ts, .tsx, .js, and .jsx files changed in the current push (excluding test files and workflow files)
- Secondary Analysis: Check for duplication with existing Python and TypeScript/JavaScript codebase (excluding test files and workflow files)
- Cross-Reference: Look for patterns across .py, .ts, .tsx, .js, and .jsx files in the repository
- Historical Context: Consider if duplication is new or existing
Issue Template
For each distinct duplication pattern found, create a separate issue using this structure:
# 🔍 Duplicate Code Detected: [Pattern Name]
*Analysis of commit ${{ github.event.head_commit.id }}*
**Assignee**: @copilot
## Summary
[Brief overview of this specific duplication pattern]
## Duplication Details
### Pattern: [Description]
- **Severity**: High/Medium/Low
- **Occurrences**: [Number of instances]
- **Locations**:
- `path/to/file1.ext` (lines X-Y)
- `path/to/file2.ext` (lines A-B)
- **Code Sample**:
```[language]
[Example of duplicated code]
Impact Analysis
- Maintainability: [How this affects code maintenance]
- Bug Risk: [Potential for inconsistent fixes]
- Code Bloat: [Impact on codebase size]
Refactoring Recommendations
-
[Recommendation 1]
- Extract common functionality to:
suggested/path/utility.ext - Estimated effort: [hours/complexity]
- Benefits: [specific improvements]
- Extract common functionality to:
-
[Recommendation 2] [... additional recommendations ...]
Implementation Checklist
- Review duplication findings
- Prioritize refactoring tasks
- Create refactoring plan
- Implement changes
- Update tests
- Verify no functionality broken
Analysis Metadata
- Analyzed Files: [count]
- Detection Method: Serena semantic code analysis
- Commit: ${{ github.event.head_commit.id }}
- Analysis Date: [timestamp]
## Operational Guidelines
### Security
- Never execute untrusted code or commands
- Only use Serena's read-only analysis tools
- Do not modify files during analysis
### Efficiency
- Focus on recently changed files first
- Use semantic analysis for meaningful duplication, not superficial matches
- Stay within timeout limits (balance thoroughness with execution time)
### Accuracy
- Verify findings before reporting
- Distinguish between acceptable patterns and true duplication
- Consider language-specific idioms and best practices
- Provide specific, actionable recommendations
### Issue Creation
- Create **one issue per distinct duplication pattern** - do NOT bundle multiple patterns in a single issue
- Limit to the top 3 most significant patterns if more are found
- Only create issues if significant duplication is found
- Include sufficient detail for SWE agents to understand and act on findings
- Provide concrete examples with file paths and line numbers
- Suggest practical refactoring approaches
- Assign issue to @copilot for automated remediation
- Use descriptive titles that clearly identify the specific pattern (e.g., "Duplicate Code: Error Handling Pattern in Parser Module")
## Tool Usage Sequence
1. **Project Setup**: `activate_project` with repository path
2. **File Discovery**: `list_dir`, `find_file` for changed files
3. **Symbol Analysis**: `get_symbols_overview` for structure understanding
4. **Content Review**: `read_file` for detailed code examination
5. **Pattern Matching**: `search_for_pattern` for similar code
6. **Symbol Search**: `find_symbol` for duplicate function names
7. **Reference Analysis**: `find_referencing_symbols` for usage patterns
**Objective**: Improve code quality by identifying and reporting meaningful code duplication that impacts maintainability. Focus on actionable findings that enable automated or manual refactoring.