INDEX
Explanations
references to clowns, particularly in relation to controversial or mocking contexts
references to clowns and associated themes
New Auto-Interp
Negative Logits
ECD
-0.75
WARD
-0.74
ASED
-0.68
dated
-0.66
equality
-0.65
neau
-0.65
galitarian
-0.65
EMP
-0.64
CE
-0.63
RA
-0.62
POSITIVE LOGITS
Clown
1.03
fish
0.96
clown
0.93
amn
0.84
ey
0.81
erey
0.79
obyl
0.78
tail
0.77
opard
0.77
mascot
0.76
Activations Density 0.015%