INDEX
Explanations
instances of the word "rather" indicating preference or comparison
New Auto-Interp
Negative Logits
swer
-0.18
sse
-0.15
ray
-0.15
vla
-0.15
Arbor
-0.15
ys
-0.14
system
-0.14
ital
-0.14
initely
-0.14
barely
-0.14
POSITIVE LOGITS
than
0.40
-than
0.29
than
0.28
THAN
0.28
_than
0.27
Than
0.27
Than
0.26
než
0.24
än
0.22
quam
0.22
Activations Density 0.016%