Components of the Central CDN
More about the components we use to make the Central CDN run smoothly.
CloudFront Distribution
There is one CloudFront distribution deployed per environment.
The most important steps in the source code are:
- Get the the origin data.
This is the information about all origins, taken from the Parameter Store. Please note that it is not the CDK code which reads this information from the Param Store but the surroundingpnpm cli deploycommand. That command puts all data into a temporary JSON file which is read by the CDK code. - Create response headers policy.
This especially refers to the content security policy (CSP) which is built here. - Create one of the possible Lambda functions.
See section Two Lambdas for more information. - Create origins.
For each origin in the origin data that has more than anundefinedin its domain, an origin and a cache behavior is created. - Create CloudFront distribution.
With all the information gathered above, a distribution is created. - Create DNS entry.
For the respective domain a DNS entry is created that "points" at the newly created CloudFront distribution. - Create Param Store entry.
An entry in the Parameter Store is created that contains the ID of the CloudFront distribution in order for other tools or services to find that information at a defined place.
Param Access Roles
Param access roles are those roles that grant the application owners write access to "their" Parameter Store entries. This is mainly used by the GitHub workflows of the applications to update the domain information for their origin.
The most important steps in the source code are:
- Get the origin config array.
This is the information that originally comes from the JSON files in theorigin-configfolder in the root of the repo. It is stored as one JSON array in an environment variable by the surroundingpnpm cli deploycommand. - For each of the entries in the origin config array that refer to the
currently deployed env:
- Create a role that grants write permission to the corresponding param for everyone in the AWS account which is defined in the origin config.
- Create an initial Parameter Store entry with an
undefinedin its domain. This is later filled by the application owners with reasonable data and used as origin data when creating the CloudFront Distribution.
Two Lambdas
As described in the CloudFront Distribution section, there are two possible Lambda functions which can be deployed in combination with the CloudFront distribution. Both of them receive incoming requests first and do something with the request before they forward the request to the next resource.
The difference between the two functions is that the 'Forwarder Lambda' is used
in the prod environment while the 'Auth Checker Lambda' is used in stage and
all development environments.
Auth Checker Lambda
The stage environment is protected by a CloudFront Lambda@Edge function called
AuthChecker. This Lambda function makes
sure that only Statista-internal (based on Okta Groups) users have access to
stage. It does so by checking the headers of all incoming requests hitting the
CloudFront distribution to have a valid token that needs to come from Okta
utilizing the
Dev-Env-Authenticator.
Forwarder Lambda
In prod, no additional authentication is necessary. The only reason to have a
CloudFront function that gets every incoming request is to add some header
information to this request. The request is then forwarded to the CloudFront
distribution.
Currently, only the header Forwarded and X-Forwarded-Host is added. It also
does canonical redirects to our defined domain names (e.g. www.statista.com
instead of e.g. statista.fr).
Deployment Trigger
The deployment trigger serves the task of starting the GitHub workflow CICD whenever one of the origin data entries in the Parameter Store changes.
The source code of that is split into two parts: the deployment trigger Lambda function which performs the triggering of the GitHub workflow and the CDK code which deploys this Lambda function and defines an EventBridge rule to trigger the Lambda whenever a Parameter Store entry changes.
In order for the Lambda function to be able to trigger the CICD workflow, a
GitHub app was manually installed. The app's credentials have been added to the
Parameter Store in the entries under /<env_name>/deploymentTrigger/config.
Backup Creator
Backups refer to data that is stored in the Parameter Store for one specific environment.
The backup creator Lambda function is triggered every time a value in the
Parameter Store is added, deleted, or modified. It takes all parameter entries
of the environment it has been deployed for (e.g. all entries that start with
/prod/), puts them into a single JSON file, and saves this file in an S3
backup bucket. The JSON file has the current date and time in its name in order
make a restoration easier and to avoid overwriting.
The S3 backup bucket is located in a different AWS account compared to the
location of the backed-up data: For stage backups, it is the prod account
and vice versa.
The source code of the backup creator is split into two parts: the Lambda function which does the actual work and the CDK code which deploys this Lambda function and defines an EventBridge rule to trigger the Lambda whenever a Parameter Store entry changes.
Web Application Firewall (WAF)
The WAF protects our origin servers from malicious requests and DoS/DDoS attacks. It is configured with rate-limits, bot-detection and certain security filters for protection.
The WAF is attached to the CloudFront distribution. The
CDK code mainly creates webACL rules. In addition,
it creates a CloudWatch LogGroup that stores WAF logs. For that, a
'LoggingConfiguration' is also created. Defining rules is encapsulated in some
member functions like createRuleForIPSet() or createRuleForUserAgent().
In order to make it as easy as possible for maintainers of the WAF to adjust the
WAF configuration, IP sets and rule action overrides are stored in .config
files in a separate resources directory. Please refer to the corresponding
README for more information about that.
Internal Requests
If it's need to call a service with the global domain-name (xxx.statista.com)
All the WAF rules are respected. To get around rate-limiting, internal requests
should set the User-Agent header to statista/1.0.0 (internal request).
Baseline GitHub
The BaselineGithub stack is deployed in
order to grant the GitHub workflows permission to add, change, or delete AWS
resources. It creates a OpenIdConnectProvider and a role which contains the
necessary policies.