INDEX
Explanations
terms related to social hierarchies and class systems
New Auto-Interp
Negative Logits
ém
-0.15
ladu
-0.15
eree
-0.15
steen
-0.14
.flag
-0.14
ÑĨин
-0.14
ynes
-0.14
ewe
-0.14
_descr
-0.13
importe
-0.13
POSITIVE LOGITS
302
0.17
allet
0.17
NH
0.16
567
0.15
145
0.15
IFF
0.15
444
0.14
271
0.14
701
0.14
Bris
0.14
Activations Density 0.101%