Skip to content

ProfilingConfig

Module: fast_llm.profile

Fields

cudacore

Type: bool    Default: False

Profile the CUDA operations on the CPU side.

cpufeature

Type: bool    Default: False

Profile the CUDA operations on the CPU side.

cyclesoptional

Type: int    Default: 1

Profile this many iterations in each profiling cycle.

ranksfeature

Type: set[int]    Default: set()

Profile only on the specified ranks.

skipoptional

Type: int    Default: 1

Skip this many iterations before starting the profiler for the first time.

waitoptional

Type: int    Default: 0

Wait this many iterations before each profiling cycle.

warmupoptional

Type: int    Default: 1

Warmup the profiler for this many iterations before each profiling cycle, i.e., enable the profiler but discard the results.

averageslogging

Type: bool    Default: False

Log a table of average and total properties for each CUDA operation.

exportlogging

Type: bool    Default: False

Export the raw profile as an artifact in chrome trace format. The profile is saved to profile_chrome_step_{step}. It can be load with Google chrome (by typing chrome://tracing/ in the address bar) or with tensorboard.

loglogging

Type: bool    Default: False

Log the profile tables to stdout, otherwise save them as artifacts.

table_widthlogging

Type: int    Default: 80

Target width for logged tables, in characters.

tracelogging

Type: bool    Default: False

Log a table of every CUDA operation in chronological order.

Used in