INDEX
Explanations
titles and positions of authority within organizations
New Auto-Interp
Negative Logits
hab
-0.15
alles
-0.15
ibri
-0.15
ÉĻ
-0.15
zÅij
-0.14
.rule
-0.14
pis
-0.14
atty
-0.14
ocity
-0.13
inne
-0.13
POSITIVE LOGITS
inker
0.17
Yen
0.15
erman
0.14
ayım
0.14
ichern
0.14
íĺķ
0.14
horizon
0.14
ilot
0.13
Bare
0.13
Template
0.13
Activations Density 0.022%