INDEX
Explanations
antisemitic content related to Jewish people, particularly content that mentions Jews in a negative or conspiratorial context.
New Auto-Interp
Negative Logits
PEM
-0.07
少
-0.06
predomin
-0.06
HID
-0.06
LinearGradient
-0.06
dığ
-0.06
Sly
-0.06
(Unit
-0.06
долж
-0.06
=url
-0.06
POSITIVE LOGITS
apis
0.06
Spir
0.06
onClick
0.06
.Cells
0.06
стория
0.06
_Set
0.06
تحص
0.06
routing
0.06
μοί
0.06
yc
0.06
Activations Density 0.010%