Aggregator
Purpose
The aggregator is a TypeScript tool that discovers and pulls documentation from all GitHub repositories that have a /docs folder. It's the first step in the documentation pipeline, collecting content from multiple repositories into a unified structure.
How It Works
The aggregator follows a simple four-step process:
- Authentication - Retrieves GitHub App credentials from AWS Secrets Manager
- Discovery - Scans all GitHub organizations where the app is installed, finding repositories with
/docsfolders - Extraction - Uses sparse checkout to clone only the
/docsfolder from each repository, keeping downloads lightweight - Index Generation - Creates the homepage (
docs/index.md) with links to all discovered repositories
Structure
The aggregator is organized into focused modules:
aws-client.ts- Fetches GitHub App credentials from Secrets Managergithub-client.ts- Authenticates with GitHub App, discovers repositoriesdocs-extractor.ts- Clones/docsfolders using sparse checkoutindex-generator.ts- Generates MkDocs configuration from aggregated docsindex.ts- Main entry point that orchestrates the pipeline
Index Generation
The index generator (index-generator.ts) creates the site homepage at docs/index.md. This page serves as a catalog of all aggregated documentation.
Discovery Process
- Scans the
docs/folder structure to find all aggregated repositories - For each repository, attempts to read metadata from entry files in priority order:
index.md(preferred)README.md(falls back to this, becomesindex.html)readme.md(last resort, becomesreadme.html)
- If no entry file is found, generates a placeholder
index.mdwith:- A warning message indicating documentation is missing
- A direct link to the repository's
/docsfolder on GitHub - Instructions for adding a README.md file
Special Handling for docs-builder
The statista/docs-builder repository (this repo) is handled differently during aggregation:
- Other repos: Content goes to
docs/{org}/{repo}/ - docs-builder: Content goes directly to
docs/root
This ensures that meta-documentation like docs/about/ appears at the top level rather than buried under docs/statista/docs-builder/about/. The docs-builder repo is excluded from appearing as a card on the index page.
Metadata Extraction
The generator reads YAML frontmatter to extract display information:
---
teaser:
roof-title: "My Service"
title: "A comprehensive guide to My Service"
---
Extracted Fields:
- Card Title:
teaser.roof-title→title→ repository name (fallback) - Card Description:
teaser.title→"No description available"(fallback)
Output Format
Generates a card grid layout grouped by organization:
---
title: Home
---
# Repositories
## PIT-Numera
<div class="grid cards" markdown>
- [**Numera Compliance Service**](PIT-Numera/compliance-service/) <br> Management of compliance requests
- [**Numera Statistic Service**](PIT-Numera/statistic-service/) <br> Storing Statistic content
</div>
Link Resolution
Different entry files produce different link formats:
| Entry File | Link Format | Resulting URL |
|---|---|---|
index.md |
{org}/{repo}/ |
{org}/{repo}/index.html |
README.md |
{org}/{repo}/ |
{org}/{repo}/index.html |
readme.md |
{org}/{repo}/readme.html |
{org}/{repo}/readme.html |
| none (placeholder) | {org}/{repo}/ |
{org}/{repo}/index.html (generated) |
This accounts for MkDocs' use_directory_urls: false configuration.
Usage
Run the full aggregation:
cd aggregator
pnpm build
pnpm start
Regenerate only the index (requires docs already aggregated):
pnpm generate-index
The aggregator outputs to the docs/ directory, organized by organization and repository name.