Skip to content

Backup

See also https://statista.atlassian.net/wiki/spaces/RA/pages/4354310156/How+to+-+RDS+Backup+create+and+apply

:warning: Last updated: 2023-11-06

Database backup

Currently, we are making database backups using RDS snapshots. A snapshot of the whole database is taken every 24 hours and is kept for 7 day. This is the default configuration for RDS snapshots.

Database backup restore

The database restore is managed via the configuration in stacks/config.py with the parameters * recovery_enabled if True, then a second database will be created from a given snapshot * keep_original_db if True, then the original database will be kept, otherwise it will be deleted. Should be True except for Stage or Dev setups, where a Snapshot DB is enough. * active_sgpt_secret_key decides which secret and therefore which database is used for Liquibase deployment and in the application code. * recovery_removal_policy decides if the recovery database can be deleted or not. * recovery_snapshot_id is the snapshot id of the snapshot that should be used for the recovery. * recovery_secret_name is the name of the secret that should be used for the recovery.

For a quick recovery it should be enough to adjust the parameters for prod like this:

'recovery_enabled': True,
'keep_original_db': True,
'active_sgpt_secret_key': 'rds/statistagptdb_recovery',
'recovery_removal_policy': RemovalPolicy.SNAPSHOT,
'recovery_snapshot_id': '<latest-snapshot-id>',
'recovery_secret_name': 'rds/statistagptdb_recovery',

This would lead to a second DB being created from the given snapshot and the original DB being kept. The application would automatically connect to the new DB created from the snapshot. The original DB would be kept and should be fixed manually as explained in the next chapter.

Database backup point in time recovery

It is also possible to recover the database to a specific point in time within the retention period of the snapshots. Currently, there is no automated way to do this. However, it is possible to do this manually via the AWS console.

Fixing the original database after a recovery

After a recovery, it is important to fix the original DB and reset the configuration to work with the original DB, so that the next recovery will work as expected and that there is no drift in CDK. Covering all scenarios is not possible, but the basic case in which you want to move the content of your Snapshot DB to the original DB will be explained here.

Fixing the original DB

To copy all the data from the Snapshot DB to the original DB, you can use DMS (Database Migration Service). Due to a problem with DMS and pgvector, you need to change the type of the embedding_vector column to text before you can start the replication. Therefore, you also need to delete the index on the embedding_vector column. Best is to run a clean Liquibase deployment after deleting and recreating the schema publicand then executing:

DROP INDEX <index_name>;
ALTER table embeddings
ALTER COLUMN embedding_vector TYPE text;

Now you can go ahead and configure a source and target endpoint in DMS. For the target endpoint, you need to add the extra connection attribute afterConnectScript=SET session_replication_role='replica'. Without it, the replication will fail on foreign key constraints.

Next you can configure a serverless replication job. Make sure to have the following configuration: * Go Full load * Target table preparation mode: Do nothing (make sure that you have prepared everything yourself) * enable Logging and set log levels to DEBUG (INFO is often not enough) * Full LOB mode - preferably 512kb size (should be enough to include a full vector) * schema: public, table: %, filter for databasechangelog(lock), no transformation rules. * VPC, Subnet and security groups - the ones used for DB * Single AZ should be fine * Capacity settings: Max 4 DMS units should be enough - the more, the faster

After the replication is finished, you can change the type back to vector with the following command:

ALTER table embeddings
ALTER COLUMN embedding_vector TYPE vector(1536) USING embedding_vector::vector;

Resetting the configuration

Once you have fixed the original DB, you can test it by changing the active_sgpt_secret_key back to the original key:

'active_sgpt_secret_key': 'rds/statistagptdb'

If this works, then you can go ahead and destroy the recovery DB by setting recovery_enabled back to False:

'recovery_enabled': False