ProfilingConfig¶
Module: fast_llm.profile
Fields¶
cuda—core-
Type:
boolDefault:FalseProfile the CUDA operations on the CPU side.
cpu—feature-
Type:
boolDefault:FalseProfile the CUDA operations on the CPU side.
cycles—optional-
Type:
intDefault:1Profile this many iterations in each profiling cycle.
ranks—feature-
Type: set[
int] Default:set()Profile only on the specified ranks.
skip—optional-
Type:
intDefault:1Skip this many iterations before starting the profiler for the first time.
wait—optional-
Type:
intDefault:0Wait this many iterations before each profiling cycle.
warmup—optional-
Type:
intDefault:1Warmup the profiler for this many iterations before each profiling cycle, i.e., enable the profiler but discard the results.
averages—logging-
Type:
boolDefault:FalseLog a table of average and total properties for each CUDA operation.
export—logging-
Type:
boolDefault:FalseExport the raw profile as an artifact in chrome trace format. The profile is saved to profile_chrome_step_{step}. It can be load with Google chrome (by typing
chrome://tracing/in the address bar) or with tensorboard. log—logging-
Type:
boolDefault:FalseLog the profile tables to stdout, otherwise save them as artifacts.
table_width—logging-
Type:
intDefault:80Target width for logged tables, in characters.
trace—logging-
Type:
boolDefault:FalseLog a table of every CUDA operation in chronological order.