Skip to content

Decision to store user consent data in DynamoDB

Context

Currently, We collect user consent data through:

  • Braze Unscubscribe Emails
  • Preference Center
  • User registration processes
  • Checkout processes

This data is stored entirely in Segment CDP, which has led to several issues:

  1. Single Point of Failure
    Segment is the only place where consent data is stored. If Segment has an outage or loses data, we risk permanently losing user consent records.

  2. No Control Over the Data
    All data lives in Segment’s systems. We can’t back it up, move it easily, or review it within our own infrastructure.

  3. Tightly Connected to Our Platform
    Segment is deeply integrated into our platform, making it difficult to update or change the system without breaking things.

  4. No Clear Environment Separation
    Development, staging, and production environments are not clearly separated in the current setup, which increases the risk of errors.

  5. Complex Data Handling
    Segment uses many custom functions to move and change data. These are hard to maintain, especially when changes are needed quickly.

To solve these problems, we decided to add a new database layer to store consent data in our own system. This database must:

  • Handle high read traffic
  • Be simple to use and maintain
  • Work across multiple regions
  • Be cost-effective
  • Be ready to become the main source of truth in the future

Decision

We will introduce an DynamoDB as an intermediary database between Statista Platform and the CDP to store consent data.

The decision has been made after several talks within a team and Markus Wolf. More Information


Why We Chose DynamoDB:

  • Fully Managed Service
    No need to manage servers; AWS handles scaling and maintenance.
  • Scalable and Fast
    Supports high-speed reads and writes; handles millions of requests per second.
  • Easy deployment across multiple region
    Unlike RDS, DynamoDB is easy to be deployed in multiple region.
  • Easy Integration
    Works well with Json Schema which is suitable for the quick start.
  • Cost Efficient Very low-cost for storage, provides pay-per-use mode.

    Feature / Cost Factor Amazon RDS (db.t3.medium) Amazon DynamoDB (On-Demand)
    Pricing Model On-Demand On-Demand
    Instance Costs db.t3.medium (1 vCPU, 3.75 GiB RAM) No instance; serverless
    Storage Costs 20 GB 20 GB
    Backups 100 GB 100 GB
    Reads & Writes NA 50M reads + 30M writes per month
    Cost Per Month $97.10 USD $69.71 USD

    Cost Estimation for Single Region

Backup Strategy

We enabled Point-in-time recovery for DynamoDB, which allows us to restore the table to any point in time within the last 35 days. This means that we are not bound to snapshots which are created once per day or hour, but we can restore the table to any second within the last 35 days. This is very useful for us, as we can restore the table to a point in time before a data corruption or deletion.

The costs for PITR are $0.20 per GB per month. Our consents table in production currently has ~500MB (at the point of writing this documentation). We would pay ~$0.10 month for our backups. PITR cost scales with table size. It is not bound to the usage or read/writes. There are no additional costs for restoring the table. The cost is only for the storage of the backups.

Limitations with DynamoDB:

  • Limited Querying Capability: DynamoDb might be complex to work with nested objects.
  • Limited Analytical Capability: Since DynamoDB is not a structured database, it has limited capabilities with filtering and aggregation operations unlike Relational databases.
  • Multiple Tables Complexity: We are currently only dealing with user consent data; if multiple tables are involved in future, it may increase the complexity as there is no native support for join unlike relational databases.