INDEX
Explanations
titles or roles associated with religious and authoritative figures
New Auto-Interp
Negative Logits
ensi
-0.15
quent
-0.15
_workers
-0.15
ault
-0.15
_SECOND
-0.14
nable
-0.14
ensis
-0.14
Attrib
-0.13
gaard
-0.13
ODEV
-0.13
POSITIVE LOGITS
åĢij
0.16
们
0.15
erna
0.15
hood
0.14
(s
0.14
orum
0.14
оÑĢоÑĤ
0.14
-elect
0.13
John
0.13
fü
0.13
Activations Density 0.137%