INDEX
Explanations
phrases related to authority or positions of leadership
New Auto-Interp
Negative Logits
odd
-0.18
inke
-0.17
ķĮ
-0.15
Cruc
-0.15
cla
-0.15
dương
-0.14
õi
-0.14
signed
-0.14
ughters
-0.14
anz
-0.13
POSITIVE LOGITS
ruk
0.15
elijk
0.15
_ONCE
0.14
asia
0.14
unct
0.14
azi
0.14
plist
0.14
ä¹ħä¹ħ
0.14
azzi
0.14
ilst
0.13
Activations Density 0.007%