INDEX
Explanations
phrases that indicate significant political actions or changes
New Auto-Interp
Negative Logits
ooter
-0.16
INUX
-0.14
oot
-0.14
yerini
-0.13
اÙĬا
-0.13
indr
-0.13
(Arg
-0.13
itte
-0.13
minecraft
-0.13
çĩ
-0.13
POSITIVE LOGITS
move
0.86
move
0.66
Move
0.63
-move
0.60
moves
0.60
Move
0.58
decision
0.57
_move
0.52
MOVE
0.51
.move
0.47
Activations Density 0.360%