INDEX
Explanations
percentages, time durations, and counts
New Auto-Interp
Negative Logits
loo
-0.73
chwitz
-0.66
multipl
-0.66
achine
-0.66
lay
-0.66
raught
-0.64
comb
-0.63
unknown
-0.63
andan
-0.62
trap
-0.62
POSITIVE LOGITS
percent
0.90
cents
0.88
hours
0.86
minutes
0.86
%.
0.82
weeks
0.81
%
0.78
seconds
0.78
calories
0.75
feet
0.73
Activations Density 0.123%