INDEX
Explanations
titles and roles associated with authority and positions of power
New Auto-Interp
Negative Logits
roud
-0.16
cmc
-0.16
éĻIJ
-0.15
linger
-0.15
endencies
-0.14
agal
-0.14
tid
-0.14
éĤ
-0.14
ancel
-0.14
olie
-0.14
POSITIVE LOGITS
called
0.18
652
0.18
apt
0.17
lege
0.16
“
0.15
langu
0.15
ãĤīãģĦ
0.14
ãĤ·ãĤ¢
0.14
"
0.14
(s
0.14
Activations Density 0.420%