INDEX
Explanations
references to institutions or identities associated with nationalities or citizenship
New Auto-Interp
Negative Logits
egin
-0.18
ette
-0.15
Gang
-0.15
Arts
-0.14
ets
-0.14
TED
-0.14
ieri
-0.14
ive
-0.14
arius
-0.14
oto
-0.14
POSITIVE LOGITS
isch
0.35
ische
0.32
ischer
0.32
ischen
0.26
isches
0.24
lands
0.18
ISH
0.18
ish
0.18
antro
0.18
iske
0.18
Activations Density 0.040%