Modelling architecture in Datadog
The Problem with Architecture Modelling
Most architecture diagrams are static and quickly become outdated. They are just snapshots in time and rarely reflect the current state of the system. Nothing is worse than an architecture diagram that nobody knows is still correct.
We tried using IcePanel, but without regular maintenance, it became useless. Ad-hoc meetings and drawing arrows in Miro boards didn’t help either. When it came to answering who owns what, we relied on Excel sheets, which led to many complaints.
To solve these problems, we can use Datadog’s Software Catalog to model our architecture as living documentation. This guide will help you define and organize your infrastructure components using Datadog Entities and take your Software Catalog to the next level.
Imaginary Scenario
Imagine the following application. A frontend application (e.g. Remix) which checks for Data in a Redis Cache, if not available calls a backend API (e.g. Java) which fetches data from a RDS (e.g. PostgreSQL), retrieves updates through a SQS Queue.
{
"title": "Imaginary Application",
"icons": [
],
"colors": [
{
"id": "blue",
"value": "#0066cc"
},
{
"id": "green",
"value": "#00aa00"
},
{
"id": "red",
"value": "#cc0000"
},
{
"id": "orange",
"value": "#ff9900"
},
{
"id": "purple",
"value": "#9900cc"
},
{
"id": "black",
"value": "#000000"
},
{
"id": "gray",
"value": "#666666"
}
],
"items": [
{
"id": "ec221f95-e493-45d0-b9c8-e552d77d6014",
"name": "Frontend App",
"icon": "aws-lambda"
},
{
"id": "494f9c91-229e-47c9-97e9-90ef69ac00cb",
"name": "Backend API",
"icon": "aws-ecs-anywhere"
},
{
"id": "6ab65626-d52b-472d-b122-3f3f8d3eec6d",
"name": "Statistics Database",
"icon": "aws-rds-on-vmware"
},
{
"id": "9ed0cd65-213b-415c-abcd-dfdacdc69105",
"name": "Redis Heap Cache",
"icon": "aws-elasticache"
},
{
"id": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8",
"name": "NLB",
"icon": "aws-elastic-load-balancing"
},
{
"id": "7f31f336-cec0-426a-a92b-08f0ece82ea0",
"name": "Update Queue",
"icon": "aws-simple-queue-service"
}
],
"views": [
{
"name": "Imaginary Application",
"items": [
{
"labelHeight": 80,
"id": "7f31f336-cec0-426a-a92b-08f0ece82ea0",
"tile": {
"x": 2,
"y": -1
}
},
{
"labelHeight": 80,
"id": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8",
"tile": {
"x": 1,
"y": 3
}
},
{
"labelHeight": 80,
"id": "9ed0cd65-213b-415c-abcd-dfdacdc69105",
"tile": {
"x": 4,
"y": -1
}
},
{
"labelHeight": 80,
"id": "6ab65626-d52b-472d-b122-3f3f8d3eec6d",
"tile": {
"x": 0,
"y": -1
}
},
{
"labelHeight": 80,
"id": "494f9c91-229e-47c9-97e9-90ef69ac00cb",
"tile": {
"x": 1,
"y": 1
}
},
{
"labelHeight": 80,
"id": "ec221f95-e493-45d0-b9c8-e552d77d6014",
"tile": {
"x": 1,
"y": 5
}
}
],
"connectors": [
{
"id": "394648db-66c1-417b-871f-4c83b26a0496",
"color": "blue",
"anchors": [
{
"id": "c46403dc-db77-4a91-8b7c-25e9a9ed553a",
"ref": {
"item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
}
},
{
"id": "94222590-49e3-4e34-ab09-3e29a8c4c7f7",
"ref": {
"item": "6ab65626-d52b-472d-b122-3f3f8d3eec6d"
}
}
]
},
{
"id": "e3b4cee3-797c-4aa3-a1d2-75756561a4fa",
"color": "blue",
"anchors": [
{
"id": "f578af68-71f4-4819-a9d8-211231c0aba6",
"ref": {
"item": "ec221f95-e493-45d0-b9c8-e552d77d6014"
}
},
{
"id": "7f7abc1c-7e0d-46a6-b640-1a8c82736d47",
"ref": {
"item": "9ed0cd65-213b-415c-abcd-dfdacdc69105"
}
}
]
},
{
"id": "88d7fe2d-c500-45fd-9de2-8a35a08519b7",
"color": "blue",
"anchors": [
{
"id": "6c447fc8-e7e4-4519-937f-496009a737af",
"ref": {
"item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
}
},
{
"id": "771b7749-5271-4250-8ec2-ec6bfb1442e2",
"ref": {
"item": "7f31f336-cec0-426a-a92b-08f0ece82ea0"
}
}
]
},
{
"id": "29619ca0-b917-46b5-b472-8f2efb15ca44",
"color": "blue",
"anchors": [
{
"id": "aa6c155d-c9f3-44ff-92f7-cee4d77ec1cd",
"ref": {
"item": "9ed0cd65-213b-415c-abcd-dfdacdc69105"
}
},
{
"id": "805b353d-3711-437b-86ce-2266122a1681",
"ref": {
"item": "9ed0cd65-213b-415c-abcd-dfdacdc69105"
}
}
]
},
{
"id": "3ff841a5-620e-4c20-b56f-ff888593cb97",
"color": "blue",
"anchors": [
{
"id": "7a41a694-ddad-4236-8427-cad30fed9815",
"ref": {
"item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
}
},
{
"id": "654cb649-5890-4154-aea0-d66f6a4b07c1",
"ref": {
"item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
}
}
]
},
{
"id": "1c45170b-e23b-4af7-9780-4665bce77a38",
"color": "blue",
"anchors": [
{
"id": "2741faf0-b055-4cd8-acd1-2fe302340b88",
"ref": {
"item": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8"
}
},
{
"id": "27fdf0f9-6057-4479-9844-ad299d929af3",
"ref": {
"item": "494f9c91-229e-47c9-97e9-90ef69ac00cb"
}
}
]
},
{
"id": "4142f8e2-e4aa-4e76-a937-fe9c52ec403a",
"color": "blue",
"anchors": [
{
"id": "bf5dc592-db2e-4fbb-8848-98eb65c3693a",
"ref": {
"item": "ec221f95-e493-45d0-b9c8-e552d77d6014"
}
},
{
"id": "0d78f57c-422f-4223-b5c5-87bccf30cb2c",
"ref": {
"item": "0cda2fea-8b50-49e7-b03c-b1ce825db4b8"
}
}
]
},
{
"id": "4a37c7aa-f146-4536-9c0b-a40eca0c2968",
"color": "blue",
"anchors": [
{
"id": "3bb91430-c149-4e2f-b9c9-c6155e69bfb5",
"ref": {
"tile": {
"x": -1,
"y": -1
}
}
},
{
"id": "2a759d72-9a0d-428f-8e56-c4571bd5cea1",
"ref": {
"tile": {
"x": -1,
"y": -1
}
}
}
]
},
{
"id": "d34b0ce8-cfa6-46d3-804a-d721d4d0fb94",
"color": "blue",
"anchors": [
{
"id": "63727c28-a309-4b8a-9be8-131891637873",
"ref": {
"tile": {
"x": -1,
"y": -1
}
}
},
{
"id": "9da05a9a-a81d-49da-8b26-6fb59f79a2c1",
"ref": {
"tile": {
"x": -1,
"y": -1
}
}
}
]
}
],
"rectangles": [],
"textBoxes": [],
"id": "32ccd5da-e646-421c-878e-64d944d1ccf3",
"lastUpdated": "2025-09-17T18:26:18.852Z"
}
],
"fitToScreen": true
}
The question is, how to model this properly in Datadog?
Software Catalog
Datadog's Software Catalog lets you define, group, and relate entities that represent different parts of your architecture. See the official documentation for more details: Datadog Software Catalog.
So which entities does Datadog offer by default?
System- a logical group of services, databases, queues etc.Service- a concrete service, e.g. a Fargate-ServiceDatastore- a database, e.g. RDS or DynamoDBQueue- e.g. SQS or TopicsFrontend- a front facing application, e.g. a React App (how is this with a Remix-App? bothserviceandfrontend?)API- OpenAPI specs pushed to DatadogExternal Providers- e.g. Stripe or Auth0(custom entities)- you can define your own entities as well
how could we map this to the imaginary application above?
| Application Component | Datadog Entity |
|---|---|
| Frontend App | Frontend |
| Backend API | Service |
| Statistics Database | Datastore |
| Redis Heap Cache | Datastore |
| NLB | LoadBalancer (custom) |
| Update Queue | Queue |
the whole application (assuming it's a microservice) could be modeled as a
System entity.
formulating it as entity.datadog.yaml files could look like this:
apiVersion: v3
kind: system
metadata:
name: imaginary-system
displayName: Imaginary System
owner: architects
description: an imaginary application for education
spec:
lifecycle: none-live
components:
- backend-api
- remix-app
---
apiVersion: v3
kind: service
metadata:
name: backend-api
displayName: Backend API
owner: architects
description: the Backend API
spec:
lifecycle: none-live
languages:
- java
dependsOn:
- statistics-db
- update-queue
---
apiVersion: v3
kind: frontend
metadata:
name: remix-app
displayName: Remix App
owner: architects
description: the Frontend Remix-App in the "Imaginary System"
spec:
lifecycle: none-live
type: frontend
dependsOn:
- backend-api
- redis-cache
---
apiVersion: v3
kind: datastore
metadata:
name: statistics-db
displayName: Statistics DB
owner: architects
description: the DB holding Statistics for the "Imaginary System"
spec:
lifecycle: none-live
type: db
---
apiVersion: v3
kind: datastore
metadata:
name: redis-cache
displayName: Heap Cache
owner: architects
description:
the Redis Cache for fast statistics lookups in the "Imaginary System"
spec:
lifecycle: none-live
type: db
---
apiVersion: v3
kind: queue
metadata:
name: update-queue
displayName: Statistics Update Queue
owner: architects
description: Statistics Updates SQS in the "Imaginary System"
spec:
lifecycle: none-live
type: queue
This would result in such a structure in Datadog:
see the
Service in Datadog
Yeah, that looks nice!
Tagging & Inference
So far so good, the theory looks nice. But how to connect the actual resources in AWS (or any other cloud provider) to these entities?
Right now we only have service as a Datadog Entity which is automatically
tagged by our Datadog CDK construct. How to define the other entities properly?
The (partial) answer is: Inference, Datadog automatically infers calls to
e.g. Queues or Databases. All you need to do is to define metadata for those
peer entities. See the official documentation for more details:
Datadog Inference.
This datastore (or queue) can then be used in the dependsOn field of the
Service entity.
So Service or Frontend entities dependOn Datastore, Queue or
External Provider entities. And System entities have components of
Service, Frontend or Api entities.
Reallife Manticore Example
Due to the inference capability we see 2 Datastore(Mysql) peers for Manticore.
One for prod and one for stage. But both are literally the same Datastore,
just in different environments. This is how we have defined our Manticore Search
Database in Datadog based on the inference capability:
apiVersion: v3
kind: datastore
metadata:
name: peer.db.system:mysql,peer.hostname:nlb-search-content-internal.statista.com
displayName: Manticore DB
owner: search-recommendations
spec:
lifecycle: production
tier: critical
type: db
componentOf:
- service:manticore
---
apiVersion: v3
kind: datastore
metadata:
name: peer.db.system:mysql,peer.hostname:manticore.frontendlegacy.stage.aws.statista.com
displayName: Manticore DB
owner: search-recommendations
spec:
lifecycle: staging
tier: critical
type: db
componentOf:
- service:manticore
notice the name field, containing the peer variables (this is how Datadog
identifies the actual resource in AWS).
The problem is that Datadog treats these as two separate databases (one for
stage, one for prod) because they have different names. However, they are
essentially the same database, just in different environments. We do not have a
unified Service UI in Datadog, and the Environment Switch does not work as
expected. We also do not want to create two separate entities in Datadog, since
they represent the same logical database. Ideally, we want a single entity
Manticore DB that is used across multiple environments (stage and prod).
An escape hatch
The peer tags in the name field of the datastore or queue are inferred
by e.g. the Database Name or Queue Name . So if we name our RDS database
users in every environment, we would only need to define one datastore
entity in Datadog:
apiVersion: v3
kind: datastore
metadata:
name: peer.db.name:users,peer.db.system:mysql
displayName: Users Database
spec:
tier: critical
type: db
see the
documentation
which peer.tags are available, and how Datadog is trying to infer the peer
from various AWS Infrastructure.
That would require a lot of work to fix all that in our whole infrastructure, and might not even work for all infrastructure components. For new greenfield projects this sounds as a promsing way to go.
I would prefer a AWS tag which can override this, e.g.
datadog:peer.db.name=users,peer.db.system:mysql> but this is not available right now.
To make it even worse, the Manticore DB doesn't use a database, instead it uses
indexes which Datadog doesnt understand et all. To fix this and bring all
Manticore Instances together this might work:
apiVersion: v3
kind: datastore
metadata:
name:
peer.db.system:mysql,peer.hostname:(nlb-search-content-internal.statista.com
OR peer.hostname:manticore.frontendlegacy.stage.aws.statista.com)
which works in the UI, but it doesnt work for entity definitions, because
Datadog does not support OR in the name field like in the UI. Another
problem is the discoverability of the peer tag values, you need to know them
in advance. So its all about trying to find the right peer tags which Datadog
is inferring inside the Software Catalog.
The following doesnt work either, because it wont OR the multiple
peer.hostnames
apiVersion: v3
kind: datastore
metadata:
name: peer.db.system:mysql,peer.hostname:nlb-search-content-internal.statista.com,peer.hostname:peer.hostname:manticore.frontendlegacy.stage.aws.statista.com
So ORing conditions in peer tags seems to be a missing feature in Datadog
right now.
Maybe Manticore is just a bad example?
Storing those definitions
I strongly advice to model your Infrastructure inside a Repository (it just needs to be integrated into Datadog properly, then it will be autoread regularly). For experimenting i prefer to use the Datadog UI, those manually created entities can be deleted all the times.
Conclusion
In Theory this looks all nice and shiny, but in practice there are still some open questions. Especially how to fix the multiple environments (stage, prod) for the same logical entity (e.g. Database) where no common peer tag can be found?
Open Questions
- What happens if we tag Databases or Queues with
service:backend-api? Is this wrong? - imagine a DB is tagged as
service:backend-apiand logs slow logs to Datadog, those logs will appear underservice:backend-apilog searches, is this correct? - Do we need to introduce additional tags, or do we remove the service tag for databases/queues?