INDEX
Explanations
references to the concept of being the only one or having exclusive authority
New Auto-Interp
Negative Logits
ses
-0.17
olini
-0.16
rieg
-0.16
tok
-0.15
oning
-0.15
Lopez
-0.15
ug
-0.14
shaw
-0.14
ypes
-0.14
alog
-0.14
POSITIVE LOGITS
proprietor
0.21
tons
0.20
baÅŁÄ±na
0.19
/single
0.19
-source
0.18
mn
0.18
-purpose
0.18
uvre
0.18
purpose
0.16
pagen
0.16
Activations Density 0.011%