INDEX
Explanations
references to anti-Semitism and related terms
New Auto-Interp
Negative Logits
aney
-0.15
illard
-0.15
ucks
-0.15
Ħ
-0.15
Lump
-0.15
Mais
-0.14
icro
-0.14
.BLL
-0.14
rego
-0.14
addy
-0.14
POSITIVE LOGITS
pter
0.19
endez
0.15
keit
0.15
ancial
0.14
ħn
0.14
_mpi
0.14
prech
0.14
metics
0.14
.biz
0.14
bat
0.14
Activations Density 0.002%