Skip to content

Components of the Central CDN

More about the components we use to make the Central CDN run smoothly.

CloudFront Distribution

There is one CloudFront distribution deployed per environment.

The most important steps in the source code are:

  1. Get the the origin data.
    This is the information about all origins, taken from the Parameter Store. Please note that it is not the CDK code which reads this information from the Param Store but the surrounding pnpm cli deploy command. That command puts all data into a temporary JSON file which is read by the CDK code.
  2. Create response headers policy.
    This especially refers to the content security policy (CSP) which is built here.
  3. Create one of the possible Lambda functions.
    See section Two Lambdas for more information.
  4. Create origins.
    For each origin in the origin data that has more than an undefined in its domain, an origin and a cache behavior is created.
  5. Create CloudFront distribution.
    With all the information gathered above, a distribution is created.
  6. Create DNS entry.
    For the respective domain a DNS entry is created that "points" at the newly created CloudFront distribution.
  7. Create Param Store entry.
    An entry in the Parameter Store is created that contains the ID of the CloudFront distribution in order for other tools or services to find that information at a defined place.

Param Access Roles

Param access roles are those roles that grant the application owners write access to "their" Parameter Store entries. This is mainly used by the GitHub workflows of the applications to update the domain information for their origin.

The most important steps in the source code are:

  1. Get the origin config array.
    This is the information that originally comes from the JSON files in the origin-config folder in the root of the repo. It is stored as one JSON array in an environment variable by the surrounding pnpm cli deploy command.
  2. For each of the entries in the origin config array that refer to the currently deployed env:
    1. Create a role that grants write permission to the corresponding param for everyone in the AWS account which is defined in the origin config.
    2. Create an initial Parameter Store entry with an undefined in its domain. This is later filled by the application owners with reasonable data and used as origin data when creating the CloudFront Distribution.

Two Lambdas

As described in the CloudFront Distribution section, there are two possible Lambda functions which can be deployed in combination with the CloudFront distribution. Both of them receive incoming requests first and do something with the request before they forward the request to the next resource.

The difference between the two functions is that the 'Forwarder Lambda' is used in the prod environment while the 'Auth Checker Lambda' is used in stage and all development environments.

Auth Checker Lambda

The stage environment is protected by a CloudFront Lambda@Edge function called AuthChecker. This Lambda function makes sure that only Statista-internal (based on Okta Groups) users have access to stage. It does so by checking the headers of all incoming requests hitting the CloudFront distribution to have a valid token that needs to come from Okta utilizing the Dev-Env-Authenticator.

Forwarder Lambda

In prod, no additional authentication is necessary. The only reason to have a CloudFront function that gets every incoming request is to add some header information to this request. The request is then forwarded to the CloudFront distribution.

Currently, only the header Forwarded and X-Forwarded-Host is added. It also does canonical redirects to our defined domain names (e.g. www.statista.com instead of e.g. statista.fr).

Deployment Trigger

The deployment trigger serves the task of starting the GitHub workflow CICD whenever one of the origin data entries in the Parameter Store changes.

The source code of that is split into two parts: the deployment trigger Lambda function which performs the triggering of the GitHub workflow and the CDK code which deploys this Lambda function and defines an EventBridge rule to trigger the Lambda whenever a Parameter Store entry changes.

In order for the Lambda function to be able to trigger the CICD workflow, a GitHub app was manually installed. The app's credentials have been added to the Parameter Store in the entries under /<env_name>/deploymentTrigger/config.

Backup Creator

Backups refer to data that is stored in the Parameter Store for one specific environment.

The backup creator Lambda function is triggered every time a value in the Parameter Store is added, deleted, or modified. It takes all parameter entries of the environment it has been deployed for (e.g. all entries that start with /prod/), puts them into a single JSON file, and saves this file in an S3 backup bucket. The JSON file has the current date and time in its name in order make a restoration easier and to avoid overwriting.

The S3 backup bucket is located in a different AWS account compared to the location of the backed-up data: For stage backups, it is the prod account and vice versa.

The source code of the backup creator is split into two parts: the Lambda function which does the actual work and the CDK code which deploys this Lambda function and defines an EventBridge rule to trigger the Lambda whenever a Parameter Store entry changes.

Web Application Firewall (WAF)

The WAF protects our origin servers from malicious requests and DoS/DDoS attacks. It is configured with rate-limits, bot-detection and certain security filters for protection.

The WAF is attached to the CloudFront distribution. The CDK code mainly creates webACL rules. In addition, it creates a CloudWatch LogGroup that stores WAF logs. For that, a 'LoggingConfiguration' is also created. Defining rules is encapsulated in some member functions like createRuleForIPSet() or createRuleForUserAgent().

In order to make it as easy as possible for maintainers of the WAF to adjust the WAF configuration, IP sets and rule action overrides are stored in .config files in a separate resources directory. Please refer to the corresponding README for more information about that.

Internal Requests

If it's need to call a service with the global domain-name (xxx.statista.com) All the WAF rules are respected. To get around rate-limiting, internal requests should set the User-Agent header to statista/1.0.0 (internal request).

Baseline GitHub

The BaselineGithub stack is deployed in order to grant the GitHub workflows permission to add, change, or delete AWS resources. It creates a OpenIdConnectProvider and a role which contains the necessary policies.