Experiments
Everything must be cleared away between runs to make sure stuff doesn't bleed across.
To run throughput/ latency experiments you'll need to set up the client machine with (on the machine itself):
Billing Estimates
To get resource measurements from the hosts running experiments we first need an inventory file atansible/inventory/billing.yml
, something like:
- [all]
- myhost1
- myhost2
- ...
- cd ansible
- ansible-playbook -i inventory/billing.yml billing_setup.yml
Data should be generated and uploaded ahead of time.
For details of the SGD experiment data see notes.
The matrix experiment data needs to be generated in bulk locally, uploaded to S3 then downloaded on the client machine (or directly copied with scp
). You must have the native tooling and pyfaasm installed to generate it up front (butthis doesn't need to be done if it's already in S3):
- inv data.tf-upload data.tf-state
SGD Experiment
- # -- Prepare --
- # Upload data (one off)
- inv data.reuters-state
- # -- Build/ upload --
- inv knative.build-native sgd reuters_svm
- inv upload sgd reuters_svm
- # -- Deploy --
- export N_WORKERS=10
- # Native containers
- inv knative.deploy-native sgd reuters_svm $N_WORKERS
- # Wasm
- inv knative.deploy $N_WORKERS
- # -- Wait --
- watch kn -n faasm service list
- watch kubectl -n faasm get pods
- # -- Run experiment --
- # Native SGD
- inv experiments.sgd --native $N_WORKERS 60000
- # Wasm SGD
- inv experiments.sgd $N_WORKERS 60000
- # -- Clean up --
- # Native SGD
- inv knative.delete-native sgd reuters_svm
- # Wasm
Tensorflow Experiment
You need to set the following environment variables for these experiments (through the knative config):
COLD_START_DELAY_MS=800
SGD_CODEGEN=off
PYTHON_CODEGEN=off
Preamble:
- # -- Build/ upload --
- inv knative.build-native tf image
- inv upload tf image
- # -- Upload data (one-off)
- inv data.tf-upload data.tf-state
Latency:
- # -- Deploy both (note small number of workers) --
- inv knative.deploy-native tf image 1
- inv knative.deploy 1
- # -- Run experiment --
- inv experiments.tf-lat
Once you've done several runs, you need to pull the results to your local machine and process:
- # SGD
- inv experiments.sgd-pull-results <user> <host>
- # Matrices
- inv experiments.matrix-pull-results <user> <host>
- # Inference latency
- inv experiments.tf-lat-pull-results <user> <host>
- # Inference throughput