INDEX
Explanations
words related to emphasis, certainty, and negation
strongly negative or absolute expressions
New Auto-Interp
Negative Logits
odore
-0.75
ipel
-0.74
inav
-0.71
igration
-0.71
opol
-0.68
adem
-0.66
iasis
-0.65
urers
-0.64
midt
-0.64
lahoma
-0.63
POSITIVE LOGITS
THING
1.20
THERE
1.20
ELY
1.19
MUCH
1.10
FUCK
1.07
VER
1.07
THEN
1.03
NOT
1.01
ADA
1.01
BE
1.01
Activations Density 0.115%