INDEX
Explanations
references to Jewish identity and the Jewish community
New Auto-Interp
Negative Logits
Ñıк
-0.16
jet
-0.15
extinction
-0.15
imoto
-0.14
utton
-0.14
Dort
-0.14
731
-0.14
Reserve
-0.14
xef
-0.14
éĺ¶
-0.14
POSITIVE LOGITS
ewish
0.35
ews
0.34
uda
0.32
EW
0.28
ewis
0.27
ew
0.26
ew
0.26
UDA
0.25
ewn
0.23
ude
0.23
Activations Density 0.007%