INDEX
Explanations
significant nouns and phrases related to authority and agency
New Auto-Interp
Negative Logits
.nano
-0.18
idge
-0.16
ucc
-0.15
inas
-0.15
ksen
-0.14
proceeds
-0.14
AMI
-0.14
ún
-0.13
Sweat
-0.13
thon
-0.13
POSITIVE LOGITS
å¡Ķ
0.18
LAR
0.17
atcher
0.16
buah
0.15
sen
0.15
odel
0.15
Slides
0.14
Rotor
0.14
lar
0.14
uf
0.14
Activations Density 0.006%