INDEX
Explanations
descriptions of actions or decisions
instances of the word "move" in various contexts
New Auto-Interp
Negative Logits
omial
-0.78
oola
-0.71
sqor
-0.66
iciency
-0.65
Barton
-0.65
Cav
-0.64
inges
-0.64
acha
-0.63
icum
-0.63
sung
-0.62
POSITIVE LOGITS
able
0.90
toward
0.81
Motion
0.81
backs
0.80
ivism
0.78
itures
0.78
over
0.76
towards
0.76
ments
0.75
forward
0.75
Activations Density 0.034%