INDEX
Explanations
phrases indicating imminent action or threats
New Auto-Interp
Negative Logits
Fab
-0.15
pand
-0.15
dfa
-0.14
Fab
-0.14
erti
-0.14
skill
-0.13
âĨ
-0.13
ÑģÑĤеÑĢ
-0.13
thought
-0.13
ACK
-0.13
POSITIVE LOGITS
Force
0.17
andom
0.16
orse
0.15
ç´¢
0.15
Å©
0.15
force
0.14
Handle
0.14
FORCE
0.14
force
0.14
Force
0.14
Activations Density 0.259%