INDEX
Explanations
the word "rather" and its variants, indicating a focus on expressing preferences or comparisons
New Auto-Interp
Negative Logits
swer
-0.17
ys
-0.14
itals
-0.14
al
-0.14
system
-0.14
ateg
-0.14
entre
-0.14
chg
-0.14
aneous
-0.14
ray
-0.14
POSITIVE LOGITS
than
0.30
than
0.22
_than
0.21
-than
0.21
Than
0.21
THAN
0.20
než
0.20
-ÑĤаки
0.20
Than
0.18
quam
0.18
Activations Density 0.014%