We use cookies to make your experience better.
At Coder, we embarked on a major rewrite of our flagship product, culminating in a v2.0 release in late 2023. Prior to this, we started an initiative to perform comprehensive load tests to proactively identify and fix issues that would block rollouts at large scale. Spoiler: it turns out that scaling is hard, so we kept doing it.
If you look on the internet, the overall consensus is that a "scale test" is where you attempt to determine the effects of increasing user load on a given system, while a "stress test" is where you throw as much load you can at a system to see how much it can handle. Our "scale tests" fall somewhere in between the two – given a Coder deployment with a certain amount of resources, we want to determine its ability to handle a given amount of load.
When we perform a scale test, we do the following:
There's a bunch of other supporting work, but that's the gist of it. Scale testing is testing at scale.
You can unpack this question a number of ways:
We're a small company, and our internal dogfood deployment has at most 19 active users. Some deployments have thousands of active users! We're obviously not going to run into the same kinds of problems as these deployments, so it's important for us to validate that Coder can perform well at this scale.
Benchmarks only test individual system components, and don't tell you what sort of behaviours you'll see at scale. Think of it as an analogue to unit tests versus integration tests – you don't just want one part of the system to perform well, you also want the whole system to perform well.
A number of reasons but the main ones are:
We'll go into more detail about our Kubernetes scale testing environment in a later post.
Because scaling is hard, it also follows that testing at scale is hard:
Most definitely! We found many issues, both large and small, for example:
We have fairly comprehensive steps documented in our GitHub repository, and we also have more detailed documentation about our scale testing method. But here's a quick version using KinD. Note that you will be constrained by the CPU and memory resources available on your host machine.
kind create cluster –name coder
kubectl cluster-info
helm repo add coder-v2 https://helm.coder.com/v2
helm repo update
helm install coder coder-v2/coder \
--namespace coder \
--create-namespace \
--set coder.resources.limits.cpu=1 \
--set coder.resources.limits.memory=1Gi
kubectl --namespace coder exec deployment/coder -- \
coder login \
--first-user-username=admin \
--first-user-email=admin@example.com \
--first-user-password=SomeSecurePassw0rd \
--first-user-trial=false
kubectl --namespace coder port-forward service/coder 8080:80 &
kubectl --namespace coder exec deployment/coder -- \
coder templates init \
--id kubernetes /tmp/kubernetes
kubectl --namespace coder exec deployment/coder -- \
coder templates push kubernetes \
-d /tmp/kubernetes \
--variable namespace=coder \
--yes
kubectl --namespace coder exec deployment/coder -- \
coder exp scaletest create-workspaces \
--template kubernetes \
--count=3 \
--parameter cpu=2 \
--parameter memory=2 \
--parameter home_disk_size=1 \
--no-cleanup
kubectl --namespace coder exec deployment/coder -- \
coder exp scaletest workspace-traffic \
--concurrency=0 \
--bytes-per-tick=128 \
--tick-interval=100ms \
--ssh \
--timeout=60s
kubectl --namespace coder exec deployment/coder -- \
coder exp scaletest cleanup
kind delete cluster --name coder
For more information on scale testing, you can see our online documentation or run coder exp scaletest –help. Note that the exp scaletest command is not included in the agent (aka. "slim" binary) to save space, so make sure you are running the full Coder binary by checking the output of coder --version.
Enjoy what you read?
Subscribe to our newsletter
By signing up, you agree to our Privacy Policy and Terms of service.