Fault Injection

Apply some chaos engineering by throwing in some HTTP errors or network delays. Understanding failure scenarios is a critical aspect of microservices architecture.

HTTP Error 503

By default, recommendation v1 and v2 are being round-robin load-balanced as that is the default behavior in Kubernetes.

kubectl get pods -l app=recommendation

NAME                                  READY     STATUS    RESTARTS   AGE
recommendation-v1-798bf87d96-d9d95   2/2       Running   4          11h
recommendation-v2-7bc4f7f696-d9j2m   2/2       Running   1          10h

You can inject 503’s, for approximately 50% of the requests.

k8s istio 6.http 503

kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
  - recommendation
  http:
  - route:
    - destination:
        host: recommendation
        subset: app-recommendation
    fault:
      abort:
        httpStatus: 503
        percent: 50
kubectl apply -f manifests/istio/fault-injection/recommendation-destination-rule.yml
kubectl apply -f manifests/istio/fault-injection/inject-error/recommendation-virtual-service-503.yml

curl $customer
customer => preference => recommendation v1 from '99634814-sf4cl': 88
curl $customer
customer => 503 preference => 503 fault filter abort
curl $customer
customer => preference => recommendation v2 from '2819441432-qsp25': 51

Clean up

kubectl delete -f manifests/istio/fault-injection/inject-error/recommendation-virtual-service-503.yml

Delay

The most insidious of possible distributed computing faults is not a "down" service but a service that is responding slowly, potentially causing a cascading failure in your network of services.

k8s istio 6.delay

kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
  - recommendation
  http:
  - fault:
      delay:
        fixedDelay: 7.000s
        percent: 50
    route:
    - destination:
        host: recommendation
        subset: app-recommendation
kubectl apply -f manifests/istio/fault-injection/delay/recommendation-virtual-service-delay.yml

And hit the customer endpoint

while true; do curl $customer; sleep .5; done

You will notice many requests to the customer endpoint now have a delay. If you are monitoring the logs for recommendation v1 and v2, you will also see the delay happens BEFORE the recommendation service is actually called

kubectl logs -l app=recommendation -c recommendation

Clean up

kubectl delete -f manifests/istio/fault-injection/recommendation-destination-rule.yml
kubectl delete -f manifests/istio/fault-injection/delay/recommendation-virtual-service-delay.yml

Retry

Instead of failing immediately, retry the Service N more times

We will make pod recommendation-v2 fail 100% of the time. Get one of the pod names from your system and replace on the following command accordingly:

kubectl exec -it $(kubectl get pods|grep recommendation-v2|awk '{ print $1 }'|head -1) \
-c recommendation -- curl 127.0.0.1:8080/misbehave

This is a special endpoint that will make our application return only 503s.

Now, if you hit the customer endpoint several times, you should see some 503's

k8s istio 6.retry

while true; do curl $customer; sleep .5; done

customer => preference => recommendation v1 from 'b87789c58-h9r4s': 864
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 865
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 866
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 867
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 868
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 869
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'

Now add the retry rule

kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
  - recommendation
  http:
  - route:
    - destination:
        host: recommendation
    retries:
      attempts: 3
      perTryTimeout: 2s
kubectl apply -f manifests/istio/fault-injection/retry/recommendation-virtual-service-v2_retry.yml

You will see it work every time because Istio will retry the recommendation service and it will land on v1 only.

k8s istio 6.retry2

while true; do curl $customer; sleep .5; done

customer => preference => recommendation v1 from '2036617847-m9glz': 196
customer => preference => recommendation v1 from '2036617847-m9glz': 197
customer => preference => recommendation v1 from '2036617847-m9glz': 198

You can see the active Virtual Services via

kubectl  get virtualservices -o yaml

Now, delete the retry rule and see the old behavior, where v2 throws 503s

kubectl delete virtualservice recommendation

while true; do curl $customer; sleep .5; done

customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1118
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1119
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1120
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1121
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1122
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1123
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1124
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1125
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'
customer => preference => recommendation v1 from 'b87789c58-h9r4s': 1126
customer => 503 preference => 503 recommendation misbehavior from '6f64f9c5b-ltrhl'

Now, make the pod v2 behave well again

kubectl exec -it $(kubectl get pods|grep recommendation-v2|awk '{ print $1 }'|head -1) \
-c recommendation -- curl 127.0.0.1:8080/behave

The application is back to random load-balancing between v1 and v2

while true; do curl $customer; sleep .5; done

customer => preference => recommendation v1 from '2039379827-h58vw': 129
customer => preference => recommendation v2 from '2036617847-m9glz': 207
customer => preference => recommendation v1 from '2039379827-h58vw': 130

Timeout

Wait only N seconds before giving up and failing. At this point, no other virtual service nor destination rule should be in effect.

First, introduce some wait time in recommendation v2 by deploying a slow version making it a slow performer with a 3 second delay.

kubectl apply -f manifests/kubernetes/recommendation-v2_slow.yml

Hit the customer endpoint a few times, to see the load-balancing between v1 and v2 but with v2 taking a bit of time to respond

k8s istio 6.timeout.1

while true; do curl $customer; sleep .5; done

Then add the timeout rule:

kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
  - recommendation
  http:
  - route:
    - destination:
        host: recommendation
    timeout: 1.000s
kubectl apply -f manifests/istio/fault-injection/timeout/recommendation-virtual-service-timeout.yml

You will see it return v1 OR "upstream request timeout" after waiting about 1 second

while true; do curl $customer; sleep .5; done

customer => 503 preference => 504 upstream request timeout
curl $customer  0.01s user 0.00s system 0% cpu 1.035 total
customer => preference => recommendation v1 from '2039379827-h58vw': 210
curl $customer  0.01s user 0.00s system 36% cpu 0.025 total
customer => 503 preference => 504 upstream request timeout
curl $customer  0.01s user 0.00s system 0% cpu 1.034 total

Clean up

Replace the recommendation:v2 slow version by the standard one.

kubectl apply -f manifests/kubernetes/recommendation-v2.yml

Delete the timeout rule.

kubectl delete -f manifests/istio/fault-injection/timeout/recommendation-virtual-service-timeout.yml