INDEX
Explanations
references to recommendations and comparisons
New Auto-Interp
Negative Logits
rieg
-0.15
finally
-0.14
ahl
-0.14
pha
-0.14
insky
-0.14
iri
-0.14
roph
-0.14
iren
-0.14
esc
-0.14
esc
-0.14
POSITIVE LOGITS
otherwise
0.38
Otherwise
0.34
Otherwise
0.32
otherwise
0.30
else
0.29
else
0.26
OTHERWISE
0.26
other
0.25
_else
0.25
jinak
0.25
Activations Density 0.182%