Aggregator

Purpose

The aggregator is a TypeScript tool that discovers and pulls documentation from all GitHub repositories that have a /docs folder. It's the first step in the documentation pipeline, collecting content from multiple repositories into a unified structure.

How It Works

The aggregator follows a simple four-step process:

Authentication - Retrieves GitHub App credentials from AWS Secrets Manager
Discovery - Scans all GitHub organizations where the app is installed, finding repositories with /docs folders
Extraction - Uses sparse checkout to clone only the /docs folder from each repository, keeping downloads lightweight
Index Generation - Creates the homepage (docs/index.md) with links to all discovered repositories

Structure

The aggregator is organized into focused modules:

aws-client.ts - Fetches GitHub App credentials from Secrets Manager
github-client.ts - Authenticates with GitHub App, discovers repositories
docs-extractor.ts - Clones /docs folders using sparse checkout
index-generator.ts - Generates MkDocs configuration from aggregated docs
index.ts - Main entry point that orchestrates the pipeline

Index Generation

The index generator (index-generator.ts) creates the site homepage at docs/index.md. This page serves as a catalog of all aggregated documentation.

Discovery Process

Scans the docs/ folder structure to find all aggregated repositories
For each repository, attempts to read metadata from entry files in priority order:
- index.md (preferred)
- README.md (falls back to this, becomes index.html)
- readme.md (last resort, becomes readme.html)
If no entry file is found, generates a placeholder index.md with:
- A warning message indicating documentation is missing
- A direct link to the repository's /docs folder on GitHub
- Instructions for adding a README.md file

Special Handling for docs-builder

The statista/docs-builder repository (this repo) is handled differently during aggregation:

Other repos: Content goes to docs/{org}/{repo}/
docs-builder: Content goes directly to docs/ root

This ensures that meta-documentation like docs/about/ appears at the top level rather than buried under docs/statista/docs-builder/about/. The docs-builder repo is excluded from appearing as a card on the index page.

Metadata Extraction

The generator reads YAML frontmatter to extract display information:

---
teaser:
  roof-title: "My Service"
  title: "A comprehensive guide to My Service"
---

Extracted Fields:

Card Title: teaser.roof-title → title → repository name (fallback)
Card Description: teaser.title → "No description available" (fallback)

Output Format

Generates a card grid layout grouped by organization:

---
title: Home
---
# Repositories

## PIT-Numera

<div class="grid cards" markdown>

- [**Numera Compliance Service**](PIT-Numera/compliance-service/) <br> Management of compliance requests
- [**Numera Statistic Service**](PIT-Numera/statistic-service/) <br> Storing Statistic content

</div>

Link Resolution

Different entry files produce different link formats:

Entry File	Link Format	Resulting URL
`index.md`	`{org}/{repo}/`	`{org}/{repo}/index.html`
`README.md`	`{org}/{repo}/`	`{org}/{repo}/index.html`
`readme.md`	`{org}/{repo}/readme.html`	`{org}/{repo}/readme.html`
none (placeholder)	`{org}/{repo}/`	`{org}/{repo}/index.html` (generated)

This accounts for MkDocs' use_directory_urls: false configuration.

Usage

Run the full aggregation:

cd aggregator
pnpm build
pnpm start

Regenerate only the index (requires docs already aggregated):

pnpm generate-index

The aggregator outputs to the docs/ directory, organized by organization and repository name.

Last updated: March 31, 2026 at 11:02

By: Axel Tetzlaff

📄 View source

Repository: statista/docs-builder