Circuit Breaker

Fail Fast with Max Connections and Max Pending Requests

For this scenario, we will equally split the traffic between recommendation v1 and the slow version of v2 :

Deploy the slow version of recommendation:v2 making it a slow performer with a 3 second delay.

kubectl apply -f manifests/kubernetes/recommendation-v2_slow.yml

Create the v1 and v2 subsets.

kubectl apply -f manifests/istio/routing-simple/recommendation-destination-rule-v1-v2.yml

Create the virtual service that splits traffic equally between v1 and V2

kubectl apply -f manifests/istio/circuit-breaker/fail-fast/recommendation-virtual-service-v1_and_v2_50_50.yml
Virtual Service
kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
  - recommendation
  http:
  - route:
    - destination:
        host: recommendation
        subset: version-v1
      weight: 50
    - destination:
        host: recommendation
        subset: version-v2
      weight: 50

Test 1: Load test without circuit breaker

Let’s perform a load test in our system with siege. We’ll have 20 clients sending 2 concurrent requests each:

docker run --rm funkygibbon/siege siege -r 2 -c 20 -v $customer

You should see an output similar to this:

siege output with all successful requests

All of the requests to our system were successful, but it took some time to run the test, as the v2 instance was a slow performer.

Test 2: Load test with circuit breaker

Suppose that in a production system this 3s delay was caused by too many concurrent requests to the same instance. We don’t want multiple requests getting queued or making the instance even slower. So we’ll add a circuit breaker that will open whenever we have more than 1 request being handled by any instance.

kubectl apply -f manifests/istio/circuit-breaker/fail-fast/recommendation-destination-rule-cb_policy_version.yml
Destination Rule
kind: DestinationRule
...
spec:
  host: recommendation
  subsets:
    - name: version-v1
      labels:
        version: v1
    - name: version-v2
      labels:
        version: v2
      trafficPolicy:
        connectionPool:
          http:
            http1MaxPendingRequests: 1
            maxRequestsPerConnection: 1
          tcp:
            maxConnections: 1

Now let’s see what is the behavior of the system running siege again:

docker run --rm funkygibbon/siege siege -r 2 -c 20 -v $customer

k8s istio 7.circuit breaker siege output with some 503 requests due to open circuit breaker

You can run siege multiple times, but in all of the executions you should see some 503 errors being displayed in the results. That’s the circuit breaker being opened whenever Istio detects more than 1 pending request being handled by the instance.

Clean up

# Replace the slow version by the standard one
kubectl apply -f manifests/kubernetes/recommendation-v2.yml

kubectl delete DestinationRule recommendation

kubectl delete VirtualService recommendation

Pool Ejection

Pool ejection or outlier detection is a resilience strategy that takes place whenever we have a pool of instances/pods to serve a client request. If the request is forwarded to a certain instance and it fails (e.g. returns a 50x error code), then Istio will eject this instance from the pool for a certain sleep window. In our example the sleep window is configured to be 15s. This increases the overall availability by making sure that only healthy pods participate in the pool of instances.

First, you need to insure you have a destinationrule and virtualservice in place to send traffic to the services. Let’s use a 50/50 split of traffic:

k8s istio 7.pool ejection

kubectl apply -f manifests/istio/routing-simple/recommendation-destination-rule-v1-v2.yml

kubectl apply -f manifests/istio/circuit-breaker/fail-fast/recommendation-virtual-service-v1_and_v2_50_50.yml

Scale number of instances of v2 deployment

kubectl scale deployment recommendation-v2 --replicas=2

Wait for all the pods to be in the ready state.

Test 1: Load test without failing instances

Throw some requests at the customer endpoint:

while true; do curl $customer; sleep .5; done

You will see the load balancing 50/50 between the two different versions of the recommendation service. And within version v2, you will also see that some requests are handled by one pod and some requests are handled by the other pod.

customer => preference => recommendation v1 from '2039379827-jmm6x': 447
customer => preference => recommendation v2 from '2036617847-spdrb': 26
customer => preference => recommendation v1 from '2039379827-jmm6x': 448
customer => preference => recommendation v2 from '2036617847-spdrb': 27
customer => preference => recommendation v1 from '2039379827-jmm6x': 449
customer => preference => recommendation v1 from '2039379827-jmm6x': 450
customer => preference => recommendation v2 from '2036617847-spdrb': 28
customer => preference => recommendation v1 from '2039379827-jmm6x': 451
customer => preference => recommendation v1 from '2039379827-jmm6x': 452
customer => preference => recommendation v2 from '2036617847-spdrb': 29
customer => preference => recommendation v2 from '2036617847-spdrb': 30
customer => preference => recommendation v2 from '2036617847-hdjv2': 216
customer => preference => recommendation v1 from '2039379827-jmm6x': 453
customer => preference => recommendation v2 from '2036617847-spdrb': 31
customer => preference => recommendation v2 from '2036617847-hdjv2': 217
customer => preference => recommendation v2 from '2036617847-hdjv2': 218
customer => preference => recommendation v1 from '2039379827-jmm6x': 454
customer => preference => recommendation v1 from '2039379827-jmm6x': 455
customer => preference => recommendation v2 from '2036617847-hdjv2': 219
customer => preference => recommendation v2 from '2036617847-hdjv2': 220

Test 2: Load test with failing instance

Let’s get the name of the pods from recommendation v2:

kubectl get pods -l app=recommendation,version=v2

You should see something like this:

NAME                                 READY     STATUS    RESTARTS   AGE
recommendation-v2-777598bdc4-d97p7   2/2       Running   0          2m
recommendation-v2-777598bdc4-kr7rt   2/2       Running   0          51m

Now we’ll get into one the pods and add some erratic behavior on it. Get one of the pod names from your system and replace on the following command accordingly:

kubectl exec -it $(kubectl get pods|grep recommendation-v2|awk '{ print $1 }'|head -1) -c recommendation -- curl 127.0.0.1:8080/misbehave

Output:  Following requests to '/' will return a 503

This is a special endpoint that will make our application return only 503s.

The error rate is 25 %. Because 1/2 of v2 is in error, v2 is 50% of traffic, hence 1/2 of 50% is in error which is equal to 25%.

k8s istio 7.pool ejection 2

Throw some requests at the customer endpoint:

while true; do curl $customer; sleep .5; done

You’ll see that whenever the pod recommendation-v2-2036617847-spdrb receives a request, you get a 503 error:

customer => preference => recommendation v1 from '2039379827-jmm6x': 494
customer => preference => recommendation v1 from '2039379827-jmm6x': 495
customer => preference => recommendation v2 from '2036617847-hdjv2': 248
customer => preference => recommendation v1 from '2039379827-jmm6x': 496
customer => preference => recommendation v1 from '2039379827-jmm6x': 497
customer => 503 preference => 503 recommendation misbehavior from '2036617847-spdrb'
customer => preference => recommendation v2 from '2036617847-hdjv2': 249
customer => preference => recommendation v1 from '2039379827-jmm6x': 498
customer => 503 preference => 503 recommendation misbehavior from '2036617847-spdrb'
customer => preference => recommendation v2 from '2036617847-hdjv2': 250
customer => preference => recommendation v1 from '2039379827-jmm6x': 499
customer => preference => recommendation v1 from '2039379827-jmm6x': 500
customer => 503 preference => 503 recommendation misbehavior from '2036617847-spdrb'
customer => preference => recommendation v1 from '2039379827-jmm6x': 501
customer => preference => recommendation v2 from '2036617847-hdjv2': 251
customer => 503 preference => 503 recommendation misbehavior from '2036617847-spdrb'

Test 3: Load test with failing instance and with pool ejection

Now let’s add the pool ejection behavior:

kubectl apply -f manifests/istio/circuit-breaker/pool-ejection/recommendation-destination-rule-cb_policy_pool_ejection.yml
kind: DestinationRule
...
spec:
  host: recommendation
  subsets:
  ...
  - labels:
      version: v2
    name: version-v2
    trafficPolicy:
      connectionPool:
        http: {}
        tcp: {}
      loadBalancer:
        simple: RANDOM
      outlierDetection:
        baseEjectionTime: 15.000s
        consecutiveErrors: 1
        interval: 5.000s
        maxEjectionPercent: 100

Throw some requests at the customer endpoint:

while true; do curl $customer; sleep .5; done

You will see that whenever you get a failing request with 503 from the pod recommendation-v2-2036617847-spdrb, it gets ejected from the pool, and it doesn’t receive any more requests until the sleep window expires - which takes at least 15s.

The number of errors dropped significantly but we still have erros.

customer => preference => recommendation v1 from '2039379827-jmm6x': 509
customer => 503 preference => 503 recommendation misbehavior from '2036617847-spdrb'
customer => preference => recommendation v1 from '2039379827-jmm6x': 510
customer => preference => recommendation v1 from '2039379827-jmm6x': 511
customer => preference => recommendation v1 from '2039379827-jmm6x': 512
customer => preference => recommendation v1 from '2039379827-jmm6x': 513
customer => preference => recommendation v1 from '2039379827-jmm6x': 514
customer => preference => recommendation v2 from '2036617847-hdjv2': 256
customer => preference => recommendation v2 from '2036617847-hdjv2': 257
customer => preference => recommendation v1 from '2039379827-jmm6x': 515
customer => preference => recommendation v2 from '2036617847-hdjv2': 258
customer => preference => recommendation v2 from '2036617847-hdjv2': 259
customer => preference => recommendation v2 from '2036617847-hdjv2': 260
customer => preference => recommendation v1 from '2039379827-jmm6x': 516
customer => preference => recommendation v1 from '2039379827-jmm6x': 517
customer => preference => recommendation v1 from '2039379827-jmm6x': 518
customer => 503 preference => 503 recommendation misbehavior from '2036617847-spdrb'
customer => preference => recommendation v1 from '2039379827-jmm6x': 519
customer => preference => recommendation v1 from '2039379827-jmm6x': 520
customer => preference => recommendation v1 from '2039379827-jmm6x': 521
customer => preference => recommendation v2 from '2036617847-hdjv2': 261
customer => preference => recommendation v2 from '2036617847-hdjv2': 262
customer => preference => recommendation v2 from '2036617847-hdjv2': 263
customer => preference => recommendation v1 from '2039379827-jmm6x': 522
customer => preference => recommendation v1 from '2039379827-jmm6x': 523
customer => preference => recommendation v2 from '2036617847-hdjv2': 264
customer => preference => recommendation v1 from '2039379827-jmm6x': 524
customer => preference => recommendation v1 from '2039379827-jmm6x': 525
customer => preference => recommendation v1 from '2039379827-jmm6x': 526
customer => preference => recommendation v1 from '2039379827-jmm6x': 527
customer => preference => recommendation v2 from '2036617847-hdjv2': 265
customer => preference => recommendation v2 from '2036617847-hdjv2': 266
customer => preference => recommendation v1 from '2039379827-jmm6x': 528
customer => preference => recommendation v2 from '2036617847-hdjv2': 267
customer => preference => recommendation v2 from '2036617847-hdjv2': 268
customer => preference => recommendation v2 from '2036617847-hdjv2': 269
customer => 503 preference => 503 recommendation misbehavior from '2036617847-spdrb'
customer => preference => recommendation v1 from '2039379827-jmm6x': 529
customer => preference => recommendation v2 from '2036617847-hdjv2': 270

Test 4: Ultimate resilience with retries, circuit breaker, and pool ejection

Even with pool ejection your application doesn’t look that resilient. That’s probably because we’re still letting some errors to be propagated to our clients. But we can improve this. If we have enough instances and/or versions of a specific service running into our system, we can combine multiple Istio capabilities to achieve the ultimate backend resilience:

  • Circuit Breaker to avoid multiple concurrent requests to an instance;

  • Pool Ejection to remove failing instances from the pool of responding instances;

  • Retries to forward the request to another instance just in case we get an open circuit breaker and/or pool ejection;

By simply adding a retry configuration to our current virtualservice, we’ll be able to get rid completely of our 503s requests. This means that whenever we receive a failed request from an ejected instance, Istio will forward the request to another supposably healthy instance.

k8s istio 7.pool ejection retry

kubectl apply -f manifests/istio/fault-injection/retry/recommendation-virtual-service-v2_retry.yml
kind: VirtualService
...
spec:
  hosts:
  - recommendation
  http:
  - retries:
      attempts: 3
      perTryTimeout: 4.000s
    route:
    - destination:
        host: recommendation
        subset: version-v1
      weight: 50
    - destination:
        host: recommendation
        subset: version-v2
      weight: 50

Throw some requests at the customer endpoint:

while true; do curl $customer; sleep .5; done

You won’t receive 503`s anymore. But the requests from recommendation `v2 are still taking more time to get a response:

customer => preference => recommendation v1 from '2039379827-jmm6x': 538
customer => preference => recommendation v1 from '2039379827-jmm6x': 539
customer => preference => recommendation v1 from '2039379827-jmm6x': 540
customer => preference => recommendation v2 from '2036617847-hdjv2': 281
customer => preference => recommendation v1 from '2039379827-jmm6x': 541
customer => preference => recommendation v2 from '2036617847-hdjv2': 282
customer => preference => recommendation v1 from '2039379827-jmm6x': 542
customer => preference => recommendation v1 from '2039379827-jmm6x': 543
customer => preference => recommendation v1 from '2039379827-jmm6x': 544
customer => preference => recommendation v2 from '2036617847-hdjv2': 283
customer => preference => recommendation v2 from '2036617847-hdjv2': 284
customer => preference => recommendation v1 from '2039379827-jmm6x': 545
customer => preference => recommendation v1 from '2039379827-jmm6x': 546
customer => preference => recommendation v1 from '2039379827-jmm6x': 547
customer => preference => recommendation v2 from '2036617847-hdjv2': 285
customer => preference => recommendation v2 from '2036617847-hdjv2': 286
customer => preference => recommendation v1 from '2039379827-jmm6x': 548
customer => preference => recommendation v2 from '2036617847-hdjv2': 287
customer => preference => recommendation v2 from '2036617847-hdjv2': 288
customer => preference => recommendation v1 from '2039379827-jmm6x': 549
customer => preference => recommendation v2 from '2036617847-hdjv2': 289
customer => preference => recommendation v2 from '2036617847-hdjv2': 290
customer => preference => recommendation v2 from '2036617847-hdjv2': 291
customer => preference => recommendation v2 from '2036617847-hdjv2': 292
customer => preference => recommendation v1 from '2039379827-jmm6x': 550
customer => preference => recommendation v1 from '2039379827-jmm6x': 551
customer => preference => recommendation v1 from '2039379827-jmm6x': 552
customer => preference => recommendation v1 from '2039379827-jmm6x': 553
customer => preference => recommendation v2 from '2036617847-hdjv2': 293
customer => preference => recommendation v2 from '2036617847-hdjv2': 294
customer => preference => recommendation v1 from '2039379827-jmm6x': 554

Our misbehaving pod recommendation-v2-2036617847-spdrb never shows up in the console, thanks to pool ejection and retry.

Clean up

kubectl scale deployment recommendation-v2 --replicas=1
kubectl delete pod -l app=recommendation,version=v2

kubectl delete virtualservice recommendation
kubectl delete destinationrule recommendation