INDEX
Explanations
mentions of public figures involved in controversial statements or actions related to societal issues
New Auto-Interp
Negative Logits
ÑĩаÑĤ
-0.15
kuÅŁ
-0.14
overs
-0.14
eldre
-0.14
uis
-0.14
nou
-0.13
çıŃ
-0.13
loys
-0.13
arez
-0.13
NEY
-0.13
POSITIVE LOGITS
yun
0.15
batim
0.15
prm
0.15
vide
0.15
Sof
0.14
icter
0.14
ãģ°ãģĭãĤĬ
0.14
lap
0.14
ogl
0.13
Sk
0.13
Activations Density 0.179%