Modelling architecture in Datadog

The Problem with Architecture Modelling

Most architecture diagrams are static and quickly become outdated. They are just snapshots in time and rarely reflect the current state of the system. Nothing is worse than an architecture diagram that nobody knows is still correct.

We tried using IcePanel, but without regular maintenance, it became useless. Ad-hoc meetings and drawing arrows in Miro boards didn’t help either. When it came to answering who owns what, we relied on Excel sheets, which led to many complaints.

To solve these problems, we can use Datadog’s Software Catalog to model our architecture as living documentation. This guide will help you define and organize your infrastructure components using Datadog Entities and take your Software Catalog to the next level.

Imaginary Scenario

Imagine the following application. A frontend application (e.g. Remix) which checks for Data in a Redis Cache, if not available calls a backend API (e.g. Java) which fetches data from a RDS (e.g. PostgreSQL), retrieves updates through a SQS Queue.

{
  "title": "Imaginary Application",
  "icons": [
   ],
  "colors": [
    {
      "id": "blue",
      "value": "#0066cc"
    },
    {
      "id": "green",
      "value": "#00aa00"
    },
    {
      "id": "red",
      "value": "#cc0000"
    },
    {
      "id": "orange",
      "value": "#ff9900"
    },
    {
      "id": "purple",
      "value": "#9900cc"
    },
    {
      "id": "black",
      "value": "#000000"
    },
    {
      "id": "gray",
      "value": "#666666"
    }
  ],
  "items": [
    {
      "id": "ec221f95-e493-45d0-b9c8-e552d77d6014",
      "name": "Frontend App",
      "icon": "aws-lambda"
    },
    {
      "id": "494f9c91-229e-47c9-97e9-90ef69ac00cb",
      "name": "Backend API",
      "icon": "aws-ecs-anywhere"
    },
    {
      "id": "6ab65626-d52b-472d-b122-3f3f8d3eec6d",
      "name": "Statistics Database",
      "icon": "aws-rds-on-vmware"
    },
    {
      "id": "9ed0cd65-213b-415c-abcd-dfdacdc69105",
      "name": "Redis Heap Cache",
      "icon": "aws-elasticache"
    },
    {
      "id": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8",
      "name": "NLB",
      "icon": "aws-elastic-load-balancing"
    },
    {
      "id": "7f31f336-cec0-426a-a92b-08f0ece82ea0",
      "name": "Update Queue",
      "icon": "aws-simple-queue-service"
    }
  ],
  "views": [
    {
      "name": "Imaginary Application",
      "items": [
        {
          "labelHeight": 80,
          "id": "7f31f336-cec0-426a-a92b-08f0ece82ea0",
          "tile": {
            "x": 2,
            "y": -1
          }
        },
        {
          "labelHeight": 80,
          "id": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8",
          "tile": {
            "x": 1,
            "y": 3
          }
        },
        {
          "labelHeight": 80,
          "id": "9ed0cd65-213b-415c-abcd-dfdacdc69105",
          "tile": {
            "x": 4,
            "y": -1
          }
        },
        {
          "labelHeight": 80,
          "id": "6ab65626-d52b-472d-b122-3f3f8d3eec6d",
          "tile": {
            "x": 0,
            "y": -1
          }
        },
        {
          "labelHeight": 80,
          "id": "494f9c91-229e-47c9-97e9-90ef69ac00cb",
          "tile": {
            "x": 1,
            "y": 1
          }
        },
        {
          "labelHeight": 80,
          "id": "ec221f95-e493-45d0-b9c8-e552d77d6014",
          "tile": {
            "x": 1,
            "y": 5
          }
        }
      ],
      "connectors": [
        {
          "id": "394648db-66c1-417b-871f-4c83b26a0496",
          "color": "blue",
          "anchors": [
            {
              "id": "c46403dc-db77-4a91-8b7c-25e9a9ed553a",
              "ref": {
                "item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
              }
            },
            {
              "id": "94222590-49e3-4e34-ab09-3e29a8c4c7f7",
              "ref": {
                "item": "6ab65626-d52b-472d-b122-3f3f8d3eec6d"
              }
            }
          ]
        },
        {
          "id": "e3b4cee3-797c-4aa3-a1d2-75756561a4fa",
          "color": "blue",
          "anchors": [
            {
              "id": "f578af68-71f4-4819-a9d8-211231c0aba6",
              "ref": {
                "item": "ec221f95-e493-45d0-b9c8-e552d77d6014"
              }
            },
            {
              "id": "7f7abc1c-7e0d-46a6-b640-1a8c82736d47",
              "ref": {
                "item": "9ed0cd65-213b-415c-abcd-dfdacdc69105"
              }
            }
          ]
        },
        {
          "id": "88d7fe2d-c500-45fd-9de2-8a35a08519b7",
          "color": "blue",
          "anchors": [
            {
              "id": "6c447fc8-e7e4-4519-937f-496009a737af",
              "ref": {
                "item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
              }
            },
            {
              "id": "771b7749-5271-4250-8ec2-ec6bfb1442e2",
              "ref": {
                "item": "7f31f336-cec0-426a-a92b-08f0ece82ea0"
              }
            }
          ]
        },
        {
          "id": "29619ca0-b917-46b5-b472-8f2efb15ca44",
          "color": "blue",
          "anchors": [
            {
              "id": "aa6c155d-c9f3-44ff-92f7-cee4d77ec1cd",
              "ref": {
                "item": "9ed0cd65-213b-415c-abcd-dfdacdc69105"
              }
            },
            {
              "id": "805b353d-3711-437b-86ce-2266122a1681",
              "ref": {
                "item": "9ed0cd65-213b-415c-abcd-dfdacdc69105"
              }
            }
          ]
        },
        {
          "id": "3ff841a5-620e-4c20-b56f-ff888593cb97",
          "color": "blue",
          "anchors": [
            {
              "id": "7a41a694-ddad-4236-8427-cad30fed9815",
              "ref": {
                "item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
              }
            },
            {
              "id": "654cb649-5890-4154-aea0-d66f6a4b07c1",
              "ref": {
                "item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
              }
            }
          ]
        },
        {
          "id": "1c45170b-e23b-4af7-9780-4665bce77a38",
          "color": "blue",
          "anchors": [
            {
              "id": "2741faf0-b055-4cd8-acd1-2fe302340b88",
              "ref": {
                "item": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8"
              }
            },
            {
              "id": "27fdf0f9-6057-4479-9844-ad299d929af3",
              "ref": {
                "item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
              }
            }
          ]
        },
        {
          "id": "4142f8e2-e4aa-4e76-a937-fe9c52ec403a",
          "color": "blue",
          "anchors": [
            {
              "id": "bf5dc592-db2e-4fbb-8848-98eb65c3693a",
              "ref": {
                "item": "ec221f95-e493-45d0-b9c8-e552d77d6014"
              }
            },
            {
              "id": "0d78f57c-422f-4223-b5c5-87bccf30cb2c",
              "ref": {
                "item": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8"
              }
            }
          ]
        },
        {
          "id": "4a37c7aa-f146-4536-9c0b-a40eca0c2968",
          "color": "blue",
          "anchors": [
            {
              "id": "3bb91430-c149-4e2f-b9c9-c6155e69bfb5",
              "ref": {
                "tile": {
                  "x": -1,
                  "y": -1
                }
              }
            },
            {
              "id": "2a759d72-9a0d-428f-8e56-c4571bd5cea1",
              "ref": {
                "tile": {
                  "x": -1,
                  "y": -1
                }
              }
            }
          ]
        },
        {
          "id": "d34b0ce8-cfa6-46d3-804a-d721d4d0fb94",
          "color": "blue",
          "anchors": [
            {
              "id": "63727c28-a309-4b8a-9be8-131891637873",
              "ref": {
                "tile": {
                  "x": -1,
                  "y": -1
                }
              }
            },
            {
              "id": "9da05a9a-a81d-49da-8b26-6fb59f79a2c1",
              "ref": {
                "tile": {
                  "x": -1,
                  "y": -1
                }
              }
            }
          ]
        }
      ],
      "rectangles": [],
      "textBoxes": [],
      "id": "32ccd5da-e646-421c-878e-64d944d1ccf3",
      "lastUpdated": "2025-09-17T18:26:18.852Z"
    }
  ],
  "fitToScreen": true
}

The question is, how to model this properly in Datadog?

Software Catalog

Datadog's Software Catalog lets you define, group, and relate entities that represent different parts of your architecture. See the official documentation for more details: Datadog Software Catalog.

So which entities does Datadog offer by default?

System - a logical group of services, databases, queues etc.
Service - a concrete service, e.g. a Fargate-Service
Datastore - a database, e.g. RDS or DynamoDB
Queue - e.g. SQS or Topics
Frontend - a front facing application, e.g. a React App (how is this with a Remix-App? both service and frontend?)
API - OpenAPI specs pushed to Datadog
External Providers - e.g. Stripe or Auth0
(custom entities) - you can define your own entities as well

how could we map this to the imaginary application above?

Application Component	Datadog Entity
Frontend App	Frontend
Backend API	Service
Statistics Database	Datastore
Redis Heap Cache	Datastore
NLB	LoadBalancer (custom)
Update Queue	Queue

the whole application (assuming it's a microservice) could be modeled as a System entity.

formulating it as entity.datadog.yaml files could look like this:

apiVersion: v3
kind: system
metadata:
  name: imaginary-system
  displayName: Imaginary System
  owner: architects
  description: an imaginary application for education
spec:
  lifecycle: none-live
  components:
    - backend-api
    - remix-app
---
apiVersion: v3
kind: service
metadata:
  name: backend-api
  displayName: Backend API
  owner: architects
  description: the Backend API
spec:
  lifecycle: none-live
  languages:
    - java
  dependsOn:
    - statistics-db
    - update-queue
---
apiVersion: v3
kind: frontend
metadata:
  name: remix-app
  displayName: Remix App
  owner: architects
  description: the Frontend Remix-App in the "Imaginary System"
spec:
  lifecycle: none-live
  type: frontend
  dependsOn:
    - backend-api
    - redis-cache

---
apiVersion: v3
kind: datastore
metadata:
  name: statistics-db
  displayName: Statistics DB
  owner: architects
  description: the DB holding Statistics for the "Imaginary System"
spec:
  lifecycle: none-live
  type: db
---
apiVersion: v3
kind: datastore
metadata:
  name: redis-cache
  displayName: Heap Cache
  owner: architects
  description:
    the Redis Cache for fast statistics lookups in the "Imaginary System"
spec:
  lifecycle: none-live
  type: db
---
apiVersion: v3
kind: queue
metadata:
  name: update-queue
  displayName: Statistics Update Queue
  owner: architects
  description: Statistics Updates SQS in the "Imaginary System"
spec:
  lifecycle: none-live
  type: queue

This would result in such a structure in Datadog:

Datadog Software Catalog Example see the Service in Datadog

Yeah, that looks nice!

Tagging & Inference

So far so good, the theory looks nice. But how to connect the actual resources in AWS (or any other cloud provider) to these entities?

Right now we only have service as a Datadog Entity which is automatically tagged by our Datadog CDK construct. How to define the other entities properly?

The (partial) answer is: Inference, Datadog automatically infers calls to e.g. Queues or Databases. All you need to do is to define metadata for those peer entities. See the official documentation for more details: Datadog Inference.

This datastore (or queue) can then be used in the dependsOn field of the Service entity.

So Service or Frontend entities dependOn Datastore, Queue or External Provider entities. And System entities have components of Service, Frontend or Api entities.

Reallife Manticore Example

Due to the inference capability we see 2 Datastore(Mysql) peers for Manticore. One for prod and one for stage. But both are literally the same Datastore, just in different environments. This is how we have defined our Manticore Search Database in Datadog based on the inference capability:

apiVersion: v3
kind: datastore
metadata:
  name: peer.db.system:mysql,peer.hostname:nlb-search-content-internal.statista.com
  displayName: Manticore DB
  owner: search-recommendations
spec:
  lifecycle: production
  tier: critical
  type: db
  componentOf:
    - service:manticore
---
apiVersion: v3
kind: datastore
metadata:
  name: peer.db.system:mysql,peer.hostname:manticore.frontendlegacy.stage.aws.statista.com
  displayName: Manticore DB
  owner: search-recommendations
spec:
  lifecycle: staging
  tier: critical
  type: db
  componentOf:
    - service:manticore

notice the name field, containing the peer variables (this is how Datadog identifies the actual resource in AWS).

The problem is that Datadog treats these as two separate databases (one for stage, one for prod) because they have different names. However, they are essentially the same database, just in different environments. We do not have a unified Service UI in Datadog, and the Environment Switch does not work as expected. We also do not want to create two separate entities in Datadog, since they represent the same logical database. Ideally, we want a single entity Manticore DB that is used across multiple environments (stage and prod).

An escape hatch

The peer tags in the name field of the datastore or queue are inferred by e.g. the Database Name or Queue Name . So if we name our RDS database users in every environment, we would only need to define one datastore entity in Datadog:

apiVersion: v3
kind: datastore
metadata:
  name: peer.db.name:users,peer.db.system:mysql
  displayName: Users Database
spec:
  tier: critical
  type: db

see the documentation which peer.tags are available, and how Datadog is trying to infer the peer from various AWS Infrastructure.

That would require a lot of work to fix all that in our whole infrastructure, and might not even work for all infrastructure components. For new greenfield projects this sounds as a promsing way to go.

I would prefer a AWS tag which can override this, e.g. datadog:peer.db.name=users,peer.db.system:mysql > but this is not available right now.

To make it even worse, the Manticore DB doesn't use a database, instead it uses indexes which Datadog doesnt understand et all. To fix this and bring all Manticore Instances together this might work:

apiVersion: v3
kind: datastore
metadata:
  name:
    peer.db.system:mysql,peer.hostname:(nlb-search-content-internal.statista.com
    OR peer.hostname:manticore.frontendlegacy.stage.aws.statista.com)

which works in the UI, but it doesnt work for entity definitions, because Datadog does not support OR in the name field like in the UI. Another problem is the discoverability of the peer tag values, you need to know them in advance. So its all about trying to find the right peer tags which Datadog is inferring inside the Software Catalog.

The following doesnt work either, because it wont OR the multiple peer.hostnames

apiVersion: v3
kind: datastore
metadata:
  name: peer.db.system:mysql,peer.hostname:nlb-search-content-internal.statista.com,peer.hostname:peer.hostname:manticore.frontendlegacy.stage.aws.statista.com

So ORing conditions in peer tags seems to be a missing feature in Datadog right now.

Maybe Manticore is just a bad example?

Storing those definitions

I strongly advice to model your Infrastructure inside a Repository (it just needs to be integrated into Datadog properly, then it will be autoread regularly). For experimenting i prefer to use the Datadog UI, those manually created entities can be deleted all the times.

Conclusion

In Theory this looks all nice and shiny, but in practice there are still some open questions. Especially how to fix the multiple environments (stage, prod) for the same logical entity (e.g. Database) where no common peer tag can be found?

Open Questions

What happens if we tag Databases or Queues with service:backend-api? Is this wrong?
imagine a DB is tagged as service:backend-api and logs slow logs to Datadog, those logs will appear under service:backend-api log searches, is this correct?
Do we need to introduce additional tags, or do we remove the service tag for databases/queues?

Last updated: January 05, 2026 at 13:31

By: Markus Wolf

📄 View source

Repository: PIT-Shared/architecture