INDEX
Explanations
references to things or individuals with a negative reputation or are widely known for notorious actions
mentions of the term "notorious" or "infamous"
New Auto-Interp
Negative Logits
zyme
-0.78
lations
-0.77
ULTS
-0.72
strings
-0.71
eling
-0.70
cos
-0.69
pt
-0.68
planes
-0.68
plet
-0.68
hire
-0.68
POSITIVE LOGITS
infamous
0.84
offender
0.83
metic
0.83
notorious
0.82
culprit
0.82
dictator
0.77
ebin
0.75
nickname
0.70
dirty
0.69
scourge
0.68
Activations Density 0.031%