Data and Content Delivery Platform - Backup and recovery
Our data itself can be re-generated from the sources (Numera and Data Production), so a loss of all data would just lead to a time delay of delivering new updates, but we can always re-trigger all jobs.
re-create statistic data
Statistics are coming from numera. The data is send from numera on every change event. To resend all published data there is an API endpoint on numera:
https -A bearer -a <token> \
statistic-service.numera.dev.aws.statista.com/content/reindex/statistics \
type==CONTENT_DELIVERY \
This example shows all possible query parameters (using macOS - HTTPie 3.2.4 (latest) docs). The type==CONTENT_DELIVERY is mandatory and defines that the data will send to the Data and Content Delivery Team. There are multiple query options to define which statistics should be send. For more information have a look at the swagger ui dev or swagger ui stage.
For prod the swagger ui is not available but the API exists, the prod url is 'statistic-service.numera.statista.com'.
You need specific user rights to access this endpoint and the user needs to be registered in SEM (Statista Employee Management) tool. Ask People in Content-Tools to get access to numera if required.
re-create market and survey data
Survey and market data comes from the Product Data department. It is in their DB and it can be extracted into our landing zone bucket using this ETL https://github.com/Product-DataOps/db-export
Backup and restore the internal DB
We have 2 DAGs in Airflow to create and restore the internal DB
- utils_create_db_dump - Trigger Fargate task to back up the internal database to S3
- utils_recreate_internal_db - Trigger Fargate task to restore up the internal database to S3

In our infrastructure we have defined 2 fargate cluster to create and backup the system, by default they have no running tasks or services.

Creating a backup of the internal DB
We have a DAG to create DB dumps and store them on S3. When the utils_create_db_dump DAG is started a fargate task starts and makes a dump of the internal DB in the corresponding environment.



The dump will be found on the 'dcd-data-bucket' in the folder backups.

it contains the timestamp of execution in order to execute it multiple times

Restore internal DB from Backup
When you want to restore the DB, you can use the utils_restore_from_db_dump task. It will start a task to restore the internal DB.
The dump must be named 'internal_db_backup_restore.dump', so you might have to rename the dump you want to use to that name.



Use dumps from other systems
If you want to use a dump from another system, you have to use the AWS CLI tools to copy the dump into the
If you have a fresh system, make sure to run the dag utils_recreate_internal_db before the restore