INDEX
Explanations
references to groups of individuals
New Auto-Interp
Negative Logits
resa
-0.15
anta
-0.15
onna
-0.14
ament
-0.14
aryl
-0.14
ental
-0.13
arna
-0.13
anto
-0.13
entin
-0.13
ANTA
-0.13
POSITIVE LOGITS
enger
0.15
lut
0.15
asaki
0.15
нил
0.14
orz
0.14
νÏĮ
0.14
Kramer
0.14
dür
0.13
prot
0.13
asu
0.13
Activations Density 0.016%