Project Config
The project configuration contains mainly mandatory fields that specify the dataset to load in Azimuth and details about the way it is handled by the app.
from typing import Optional
from pydantic import BaseSettings, Field
from azimuth.config import ColumnConfiguration, CustomObject
class ProjectConfig(BaseSettings):
name: str = "New project"
dataset: CustomObject
columns: ColumnConfiguration = ColumnConfiguration()
rejection_class: Optional[str] = "REJECTION_CLASS"
Name
🟡 Default value: New project
Environment Variable: NAME
Any name can be set for the config. For example, it can represent the name of the dataset and/or the
model. Ex: Banking77 Model v4
.
Dataset
🔴 Mandatory field
To define which dataset to load in the application, Azimuth uses Custom Objects.
If the dataset is already on HuggingFace, you can use
the datasets.load_dataset
from HF, as shown in the
example below. If you have your own dataset, you will need to create your own custom object, as
explained in Defining Dataset.
from typing import Any, Dict, List, Optional, Union
from pydantic import BaseModel, Field
class CustomObject(BaseModel):
class_name: str # (1)
args: List[Union["CustomObject", Any]] = []
kwargs: Dict[str, Union["CustomObject", Any]] = {}
remote: Optional[str] = None # (2)
- Name of the function or class that is located in
remote
.args
andkwargs
will be sent to the function/class. - Absolute path to class.
class_name
needs to be accessible from this path.
Example to load banking77
from HF.
Columns
🟡 Default value: ColumnConfiguration()
All dataset column names are configurable. The mandatory columns and their descriptions are as follows:
Field name | Default | Description |
---|---|---|
text_input |
utterance |
The preprocessed utterance. |
label |
label |
The class label for the utterance, as type datasets.ClassLabel . |
persistent_id |
row_idx |
A unique identifier for each utterance, as type datasets.Value("int16") or datasets.Value("string") . |
from pydantic import BaseModel
class ColumnConfiguration(BaseModel):
text_input: str = "utterance" # (1)
raw_text_input: str = "utterance_raw" # (2)
label: str = "label" # (3)
failed_parsing_reason: str = "failed_parsing_reason" # (4)
persistent_id: str = "row_idx" # (5)
- Column for the text input that will be send to the pipeline.
- Optional column for the raw text input (before any pre-processing). Unused at the moment.
- Features column for the label
- Optional column to specify whether an example has failed preprocessing. Unused at the moment.
- Column with a unique identifier for every example that should be persisted if the dataset is modified, such as if new examples are added or if examples are modified or removed. It defaults to the Azimuth generated index.
Rejection class
🟡 Default value: REJECTION_CLASS
The field rejection_class
requires the class to be present in the dataset. If your dataset doesn't
have a rejection class, set the value to null
. More details on the rejection class are available
in Prediction Outcomes.