Decision to store user consent data in DynamoDB

Context

Currently, We collect user consent data through:

Braze Unscubscribe Emails
Preference Center
User registration processes
Checkout processes

This data is stored entirely in Segment CDP, which has led to several issues:

Single Point of Failure
Segment is the only place where consent data is stored. If Segment has an outage or loses data, we risk permanently losing user consent records.
No Control Over the Data
All data lives in Segment’s systems. We can’t back it up, move it easily, or review it within our own infrastructure.
Tightly Connected to Our Platform
Segment is deeply integrated into our platform, making it difficult to update or change the system without breaking things.
No Clear Environment Separation
Development, staging, and production environments are not clearly separated in the current setup, which increases the risk of errors.
Complex Data Handling
Segment uses many custom functions to move and change data. These are hard to maintain, especially when changes are needed quickly.

To solve these problems, we decided to add a new database layer to store consent data in our own system. This database must:

Handle high read traffic
Be simple to use and maintain
Work across multiple regions
Be cost-effective
Be ready to become the main source of truth in the future

Decision

We will introduce an DynamoDB as an intermediary database between Statista Platform and the CDP to store consent data.

The decision has been made after several talks within a team and Markus Wolf. More Information

Why We Chose DynamoDB:

Fully Managed Service
No need to manage servers; AWS handles scaling and maintenance.

Scalable and Fast
Supports high-speed reads and writes; handles millions of requests per second.

Easy deployment across multiple region
Unlike RDS, DynamoDB is easy to be deployed in multiple region.

Easy Integration
Works well with Json Schema which is suitable for the quick start.

Cost Efficient Very low-cost for storage, provides pay-per-use mode.

Feature / Cost Factor	Amazon RDS (db.t3.medium)	Amazon DynamoDB (On-Demand)
Pricing Model	On-Demand	On-Demand
Instance Costs	db.t3.medium (1 vCPU, 3.75 GiB RAM)	No instance; serverless
Storage Costs	20 GB	20 GB
Backups	100 GB	100 GB
Reads & Writes	NA	50M reads + 30M writes per month
Cost Per Month	$97.10 USD	$69.71 USD

Cost Estimation for Single Region

Backup Strategy

We enabled Point-in-time recovery for DynamoDB, which allows us to restore the table to any point in time within the last 35 days. This means that we are not bound to snapshots which are created once per day or hour, but we can restore the table to any second within the last 35 days. This is very useful for us, as we can restore the table to a point in time before a data corruption or deletion.

The costs for PITR are $0.20 per GB per month. Our consents table in production currently has ~500MB (at the point of writing this documentation). We would pay ~$0.10 month for our backups. PITR cost scales with table size. It is not bound to the usage or read/writes. There are no additional costs for restoring the table. The cost is only for the storage of the backups.

Limitations with DynamoDB:

Limited Querying Capability: DynamoDb might be complex to work with nested objects.

Limited Analytical Capability: Since DynamoDB is not a structured database, it has limited capabilities with filtering and aggregation operations unlike Relational databases.

Multiple Tables Complexity: We are currently only dealing with user consent data; if multiple tables are involved in future, it may increase the complexity as there is no native support for join unlike relational databases.

Last updated: March 27, 2026 at 16:28

By: Phil Pieper

📄 View source

Repository: PIT-MarTech/cdp-event-handling-infrastructure