INDEX
Explanations
information related to decisions or actions being taken
instances of the word "move" in various contexts
New Auto-Interp
Negative Logits
sqor
-0.79
omial
-0.72
vae
-0.71
attm
-0.64
Koran
-0.63
errors
-0.63
ordon
-0.63
inges
-0.62
Condition
-0.61
IZE
-0.60
POSITIVE LOGITS
toward
0.96
able
0.95
towards
0.95
backs
0.94
ments
0.92
ment
0.81
over
0.80
rers
0.78
abouts
0.77
forward
0.73
Activations Density 0.038%