INDEX
Explanations
words that denote leadership or refer to top-ranking entities or positions
New Auto-Interp
Negative Logits
se
-0.16
ych
-0.15
ment
-0.15
sphere
-0.14
a
-0.14
umble
-0.14
Bean
-0.14
maz
-0.14
environment
-0.14
pie
-0.14
POSITIVE LOGITS
-edge
0.21
Escort
0.18
ãĥ³ãĥĨ
0.16
Ľ°
0.15
ấp
0.15
-flight
0.15
strument
0.14
Escort
0.14
ierge
0.14
irut
0.14
Activations Density 0.010%