C. Run on Your Use Case
This page guides you through the process of running the app on your data and pipelines, using Docker. Different dataset and text classification models can be supported in Azimuth.
Launch Azimuth with no pipeline, or with multiple pipelines
Azimuth supports specifying no pipelines, to only perform dataset analysis. It also supports supplying mulitple pipelines, to allow for quick comparison. However, only one dataset per config is allowed.
The simplest scenario is if you have a HuggingFace (HF) dataset and model. For the sake of simplicity, we explain the instructions to run the app with this scenario. However, you will quickly need to learn about the Configuration details and Custom Objects to launch more complex use cases.
1. Prepare the Config File
Run our demo first
You haven't run our demo yet? You might want to verify your setup before feeding your own model and dataset. Go back to B. Learn Basics.
Start from an existing config and edit the relevant fields to adapt it to your dataset and
models. Examples with an HuggingFace (HF)
dataset and model are available in config/examples
(CLINC
is also shown below).
- Put your model checkpoint (results
of .save_pretained())
under the folder
azimuth_shr
. - In
config
, copyconfig/examples/clinc_oos/conf.json
to a new folder with your project name. Ex:config/my_project/conf.json
. -
Edit the config:
name
: put your project name.dataset.args
: specify the args required to load your dataset withdatasets.load_dataset
.- Edit
columns
andrejection_class
based on the dataset. pipelines.models.kwargs.checkpoint_path
: put your own checkpoint path to your model. The path should start with/azimuth_shr
, since this folder will be mounted on Docker.- Edit the
saliency_layer
so it is the name of the input layer of the model. It should be set tonull
if your model is not from PyTorch or without a word-embedding layer.
Links to full reference
If you need more details on some of these fields:
- The Project Config explains in
more details
name
,dataset
,columns
andrejection_class
. - The Model Contract Config
details how to define
pipelines
,model_contract
andsaliency_layer
.
{
"name": "CLINC150", # (1)
"dataset": {
"class_name": "datasets.load_dataset", # (2)
"args": [ # (3)
"clinc_oos",
"imbalanced"
]
},
"columns": { # (4)
"text_input": "text",
"label": "intent"
},
"rejection_class": "oos", # (5)
"model_contract": "hf_text_classification", # (6)
"pipelines": [ # (7)
{
"model": {
"class_name": "loading_resources.load_hf_text_classif_pipeline", # (8)
"remote": "/azimuth_shr", # (9)
"kwargs": { # (10)
"checkpoint_path": "transformersbook/
distilbert-base-uncased-distilled-clinc"
}
}
}
],
"saliency_layer": "distilbert.embeddings.word_embeddings", # (11)
}
- Name for your project. Shown in the application to identify your config.
- If the dataset is a
HF
dataset, use thisclass_name
. kwargs
to send to theclass_name
.- Specify the name of the dataset columns, such as the column with the utterance and the label.
- Specify the value if a rejection option is present in the classes.
- If the pipeline is a
HF
pipeline, use thismodel_contract
. - Multiples ML pipelines can be listed to be available in the webapp.
- If this a
HF
pipeline, use thisclass_name
. - Change only if
class_name
is not found in/azimuth_shr
. kwargs
to send to the class. Onlycheckpoint_path
if you use the class above.- Name of the layer on which to compute saliency maps.
2. Running the App
- In the terminal, go to the
azimuth
root directory. - Set
CFG_PATH=/config/my_project/conf.json
with the location of the config.- The initial
/
is required as your local config folder will be mounted on the Docker container at the root.
- The initial
- Execute the following command:
- The app will be accessible at http://localhost:8080 after a few minutes of waiting. The start-up tasks will start. The back-end API will be accessible at http://localhost:8080/api/local/docs.
After a successful start, Azimuth saves the provided config in its config_history.jsonl
artifact. If you use the API to edit the config, the edits are saved there. If you restart Azimuth (for example after shutting it down for the night), you can resume where you left off with:
LOAD_CONFIG_HISTORY=1
and a CFG_PATH
together, in which case Azimuth will automatically
- load the config from
CFG_PATH
when it first starts (ifconfig_history.jsonl
is empty); and - load the config from
config_history.jsonl
from then on (if Azimuth is restarted).
For example:
Although confusing, this enables you to stop and restart the docker container with the same command.Advanced Settings
Additional Config Fields
The Configuration reference details all additional fields that can be set, such as changing how behavioral tests are executed, the similarity analysis encoder, the batch size and so on.
Environment variables
No matter where you launch the app from, you can always configure some options through environment variables. They are all redundant with the config attributes, so you can set them in either place. They are the following:
- Specify the threshold of your model by passing
TH
(ex:TH=0.6
orNaN
if there is no threshold) in the command. If multiple pipelines are defined, the threshold will apply to all. - Similarly, pass
TEMP=Y
(ex:TEMP=3
) to set the temperature of the model. - Disable behavioral tests and similarity by passing respectively
BEHAVIORAL_TESTING=null
andSIMILARITY=null
. - Specify the name of the project, passing
NAME
. - You can specify the device on which to run Azimuth, with
DEVICE
being one ofauto
,gpu
orcpu
. If none is provided,auto
will be used. Ex:DEVICE=gpu
. - Specify
READ_ONLY_CONFIG=1
to lock the config once Azimuth is launched. - Specify
LOAD_CONFIG_HISTORY=1
to load the latest config from Azimuth's config history.
Config file prevails over environment variables
Remember that the values above are defined in the config too. If conflicting values are defined, values from the config file will prevail.