INDEX
Explanations
terms related to anti-Semitism and its various forms
New Auto-Interp
Negative Logits
omed
-0.16
aily
-0.15
fully
-0.15
umbled
-0.15
ÏĦά
-0.15
nad
-0.14
rosso
-0.14
omer
-0.14
rgan
-0.14
ogue
-0.13
POSITIVE LOGITS
EDIA
0.15
lesia
0.15
beck
0.15
ept
0.14
ofi
0.14
pter
0.14
Brushes
0.14
μοÏģ
0.14
loat
0.13
Baker
0.13
Activations Density 0.002%