When working with large codebases, having clear documentation about each directory’s purpose and contents is crucial. This guide shows how to use Codegen and AI to automatically generate a hierarchical README that explains your codebase structure.

Generating Directory READMEs

Here’s how to recursively generate README files for each directory using AI:

def generate_directory_readme(directory):
    # Skip non-source directories
    if any(skip in directory.name for skip in [
        'node_modules', 'venv', '.git', '__pycache__', 'build', 'dist'
    ]):
        return
        
    # Collect directory contents for context
    files = [f for f in directory.files if f.is_source_file]
    functions = directory.functions
    classes = directory.classes
    
    # Create context for AI
    context = {
        "Directory Name": directory.name,
        "Files": [f"{f.name} ({len(f.source.splitlines())} lines)" for f in files],
        "Functions": [f.name for f in functions],
        "Classes": [c.name for c in classes]
    }
    
    # Generate directory summary using AI
    readme_content = codebase.ai(
        prompt="""Generate a README section that explains this directory's:
        1. Purpose and responsibility
        2. Key components and their roles
        3. How it fits into the larger codebase
        4. Important patterns or conventions
        
        Keep it clear and concise.""",
        target=directory,
        context=context
    )
    
    # Add file listing
    if files:
        readme_content += "\n\n## Files\n"
        for file in files:
            # Get file summary from AI
            file_summary = codebase.ai(
                prompt="Describe this file's purpose in one line:",
                target=file
            )
            readme_content += f"\n### {file.name}\n{file_summary}\n"
            
            # List key components
            if file.classes:
                readme_content += "\nKey classes:\n"
                for cls in file.classes:
                    readme_content += f"- `{cls.name}`\n"
            if file.functions:
                readme_content += "\nKey functions:\n"
                for func in file.functions:
                    readme_content += f"- `{func.name}`\n"
    
    # Create or update README.md
    readme_path = f"{directory.path}/README.md"
    if codebase.has_file(readme_path):
        readme_file = codebase.get_file(readme_path)
        readme_file.edit(readme_content)
    else:
        readme_file = codebase.create_file(readme_path)
        readme_file.edit(readme_content)
    
    # Recursively process subdirectories
    for subdir in directory.subdirectories:
        generate_directory_readme(subdir)

# Generate READMEs for the entire codebase
generate_directory_readme(codebase.root_directory)

This will create a hierarchy of README.md files that explain each directory’s purpose and contents. For example:

# src/
Core implementation directory containing the main business logic and data models.
This directory is responsible for the core functionality of the application.

## Key Patterns
- Business logic is separated from API endpoints
- Models follow the Active Record pattern
- Services implement the Repository pattern

## Files

### models.py
Defines the core data models and their relationships.

Key classes:
- `User`
- `Product`
- `Order`

### services.py
Implements business logic and data access services.

Key classes:
- `UserService`
- `ProductService`
Key functions:
- `initialize_db`
- `migrate_data`

Customizing the Generation

You can customize the README generation by modifying the prompts and adding more context:

def get_directory_patterns(directory):
    """Analyze common patterns in a directory"""
    patterns = []
    
    # Check for common file patterns
    if any('test_' in f.name for f in directory.files):
        patterns.append("Contains unit tests")
    if any('interface' in f.name.lower() for f in directory.files):
        patterns.append("Uses interface-based design")
    if any(c.is_dataclass for c in directory.classes):
        patterns.append("Uses dataclasses for data models")
        
    return patterns

def generate_enhanced_readme(directory):
    # Get additional context
    patterns = get_directory_patterns(directory)
    dependencies = [imp.module for imp in directory.imports]
    
    # Enhanced context for AI
    context = {
        "Common Patterns": patterns,
        "Dependencies": dependencies,
        "Parent Directory": directory.parent.name if directory.parent else None,
        "Child Directories": [d.name for d in directory.subdirectories],
        "Style": "technical but approachable"
    }
    
    # Generate README with enhanced context
    # ... rest of the generation logic

Best Practices

  1. Keep Summaries Focused: Direct the AI to generate concise, purpose-focused summaries.

  2. Include Key Information:

    • Directory purpose
    • Important patterns
    • Key files and their roles
    • How components work together
  3. Maintain Consistency: Use consistent formatting and structure across all READMEs.

  4. Update Regularly: Regenerate READMEs when directory structure or purpose changes.

  5. Version Control: Commit generated READMEs to track documentation evolution.

The AI-generated summaries are a starting point. Review and refine them to ensure accuracy and completeness.

Be mindful of sensitive information in your codebase. Configure the generator to skip sensitive files or directories.