INDEX
Explanations
phrases emphasizing strong descriptors, particularly the word "very."
New Auto-Interp
Negative Logits
orre
-0.17
ÑģÑĤÑİ
-0.15
оÑĢÑıд
-0.15
owed
-0.14
oris
-0.14
prot
-0.14
ory
-0.13
ustr
-0.13
utch
-0.13
ály
-0.13
POSITIVE LOGITS
same
0.24
same
0.19
essence
0.18
SAME
0.18
thing
0.18
opposite
0.16
mention
0.15
existence
0.15
worst
0.15
790
0.15
Activations Density 0.019%