INDEX
Explanations
references to individuals and their relationships
New Auto-Interp
Negative Logits
Jew
-0.23
Jewish
-0.21
Jews
-0.17
Äįet
-0.16
McCabe
-0.16
Jude
-0.15
jerne
-0.15
quan
-0.15
ÙĨج
-0.15
Judaism
-0.14
POSITIVE LOGITS
Ya
0.30
Yo
0.29
Av
0.28
Mos
0.27
Ya
0.26
Men
0.25
Nissan
0.25
Yak
0.24
It
0.24
Mos
0.23
Activations Density 0.059%