Defining Dataset
In  Project Config is described how a dataset needs to
be defined with a  Custom Object in the config. This section details how
to define the class_name, args and kwargs defined in the custom object.
Dataset Definition
Azimuth supports the HuggingFace Dataset API. The loading function for the dataset must respect the following contract:
from datasets import DatasetDict
from azimuth.config import AzimuthConfig
def load_your_dataset(azimuth_config: AzimuthConfig, **kwargs) -> DatasetDict:
    ...
Your don't have a HuggingFace Dataset?
If your dataset is not a HuggingFace Dataset, you can convert it easily using the following
resources from HuggingFace:
We suggest following this HuggingFace tutorial to know more about dataset loading using Huggingface.
Dataset splits
Azimuth expects the train and one of validation or test splits to be available. If
both validation and test are available, we will pick the former. The train is not mandatory for Azimuth to run.
Column names and rejection class
Go to the Project Config to see other attributes that should be set along with the dataset.
Example
Using this API, we can load SST2, a sentiment analysis dataset.
Note: in this case, we can omit azimuth_config from the definition because we don't need it.