INDEX
Explanations
phrases containing political or authoritative statements that are emphasized with special characters
symbols indicating emphasis or special notation
New Auto-Interp
Negative Logits
eers
-0.67
playbook
-0.66
sew
-0.65
nod
-0.64
diversion
-0.64
chall
-0.64
senal
-0.63
guiActiveUnfocused
-0.63
iewicz
-0.63
lair
-0.63
POSITIVE LOGITS
there
0.94
certain
0.91
ihad
0.87
nob
0.86
someone
0.85
while
0.84
fter
0.83
these
0.82
they
0.82
return
0.80
Activations Density 0.138%