Project Config

The project configuration contains mandatory fields that specify the dataset to load in Azimuth.

Class DefinitionConfig Example

from typing import Optional

from pydantic import BaseSettings, Field

from azimuth.config import ColumnConfiguration, CustomObject

class ProjectConfig(BaseSettings):
    name: str = Field("New project", env="NAME")
    dataset: CustomObject
    columns: ColumnConfiguration = ColumnConfiguration()
    rejection_class: Optional[str] = "REJECTION_CLASS"

{
  "name": "Banking77 Model v4",
  "dataset": {
    "class_name": "datasets.load_dataset",
    "args": [
      "banking77"
    ]
  },
  "columns": {
    "text_input": "text",
    "label": "target"
  },
  "rejection_class": "NA",
}

Name

Default value: New project

Environment Variable: NAME

Any name can be set for the config. For example, it can represent the name of the dataset and/or the model. Ex: Banking77 Model v4.

Dataset

Mandatory field

To define which dataset to load in the application, Azimuth uses Custom Objects.

If the dataset is already on HuggingFace, you can use the datasets.load_dataset from HF, as shown in the example below. If you have your own dataset, you will need to create your own custom object, as explained in Defining Dataset.

Custom Object DefinitionConfig Example with HF

from typing import Any, Dict, List, Optional, Union

from pydantic import BaseModel, Field

class CustomObject(BaseModel):
    class_name: str = Field(..., title="Class name to load")
    args: List[Union["CustomObject", Any]] = []
    kwargs: Dict[str, Union["CustomObject", Any]] = {}
    remote: Optional[str] = None # (1)

Absolute path to class_name.

Example to load banking77 from HF.

{
  "dataset": {
    "class_name": "datasets.load_dataset",
    "args": [
      "banking77"
    ]
  }
}

Columns

Default value: ColumnConfiguration()

All dataset column names are configurable. The mandatory columns and their descriptions are as follows:

Field name	Default	Description
`text_input`	`utterance`	The preprocessed utterance.
`label`	`label`	The class label for the utterance, as type `datasets.ClassLabel`.

Class DefinitionConfig Example

from pydantic import BaseModel

class ColumnConfiguration(BaseModel):
    text_input: str = "utterance" # (1)
    raw_text_input: str = "utterance_raw" # (2)
    label: str = "label" # (3)
    failed_parsing_reason: str = "failed_parsing_reason" # (4)

Column for the text input that will be send to the pipeline.
Optional column for the raw text input (before any pre-processing). Unused at the moment.
Features column for the label
Optional column to specify whether an example has failed preprocessing. Unused at the moment.

Example to override the default column values.

{
  "columns": {
    "text_input": "text",
    "label": "target"
  }
}

Rejection class

Default value: REJECTION_CLASS

The field rejection_class requires the class to be present in the dataset. If your dataset doesn't have a rejection class, set the value to null. More details on the rejection class is available in Prediction Outcomes.