INDEX
Explanations
concepts related to authority or dominance
New Auto-Interp
Negative Logits
moz
-0.16
wards
-0.15
Ale
-0.14
skill
-0.14
mos
-0.14
Unused
-0.14
ynam
-0.13
Holt
-0.13
zi
-0.13
635
-0.13
POSITIVE LOGITS
supreme
0.36
Supreme
0.29
reign
0.23
ited
0.20
over
0.20
ite
0.18
terror
0.18
ITE
0.17
monarch
0.17
iting
0.17
Activations Density 0.026%