Experiments

Everything must be cleared away between runs to make sure stuff doesn't bleed across.

To run throughput/ latency experiments you'll need to set up the client machine with (on the machine itself):

Billing Estimates

To get resource measurements from the hosts running experiments we first need an inventory file atansible/inventory/billing.yml, something like:

[all]
myhost1
myhost2
...

cd ansible
ansible-playbook -i inventory/billing.yml billing_setup.yml

Data should be generated and uploaded ahead of time.

For details of the SGD experiment data see notes.

The matrix experiment data needs to be generated in bulk locally, uploaded to S3 then downloaded on the client machine (or directly copied with scp). You must have the native tooling and pyfaasm installed to generate it up front (butthis doesn't need to be done if it's already in S3):

inv data.tf-upload data.tf-state

SGD Experiment

# -- Prepare --
# Upload data (one off)
inv data.reuters-state
 
# -- Build/ upload --
inv knative.build-native sgd reuters_svm
inv upload sgd reuters_svm
 
# -- Deploy --
 
export N_WORKERS=10
 
# Native containers
inv knative.deploy-native sgd reuters_svm $N_WORKERS
 
# Wasm
inv knative.deploy $N_WORKERS
 
# -- Wait --
 
watch kn -n faasm service list
watch kubectl -n faasm get pods
 
# -- Run experiment --
 
# Native SGD
inv experiments.sgd --native $N_WORKERS 60000
 
# Wasm SGD
inv experiments.sgd $N_WORKERS 60000
 
# -- Clean up --
 
# Native SGD
inv knative.delete-native sgd reuters_svm
 
# Wasm

Tensorflow Experiment

You need to set the following environment variables for these experiments (through the knative config):

COLD_START_DELAY_MS=800
SGD_CODEGEN=off
PYTHON_CODEGEN=off

Preamble:

# -- Build/ upload --
inv knative.build-native tf image
inv upload tf image
 
# -- Upload data (one-off)
inv data.tf-upload data.tf-state

Latency:

# -- Deploy both (note small number of workers) --
inv knative.deploy-native tf image 1
inv knative.deploy 1
 
# -- Run experiment --
inv experiments.tf-lat

Once you've done several runs, you need to pull the results to your local machine and process:

# SGD
inv experiments.sgd-pull-results <user> <host>
 
# Matrices
inv experiments.matrix-pull-results <user> <host>
 
# Inference latency
inv experiments.tf-lat-pull-results <user> <host>
 
# Inference throughput