INDEX
Explanations
keywords related to antisemitism
references to antisemitism and related terms
New Auto-Interp
Negative Logits
woods
-0.73
NAS
-0.65
aver
-0.65
abee
-0.64
avez
-0.64
Wiggins
-0.62
come
-0.62
OWER
-0.61
packing
-0.60
ateur
-0.60
POSITIVE LOGITS
slurs
0.98
Semitism
0.91
ophobic
0.91
Semitic
0.87
ophobia
0.82
prejudice
0.80
tropes
0.73
sylv
0.72
itism
0.72
tendencies
0.71
Activations Density 0.014%