INDEX
Explanations
references to clowns, specifically those in a negative or controversial context
references to clowns and their comedic portrayal
New Auto-Interp
Negative Logits
galitarian
-0.69
WARD
-0.67
AAF
-0.66
RA
-0.65
Ethiopian
-0.65
ECD
-0.65
equality
-0.64
ASED
-0.64
ç¥ŀ
-0.63
ãĥ´
-0.63
POSITIVE LOGITS
clown
1.00
Clown
1.00
fish
0.98
amn
0.79
icter
0.77
mascot
0.77
obyl
0.77
opard
0.76
pool
0.75
oon
0.75
Activations Density 0.007%