INDEX
Explanations
comparative phrases that suggest improvement or superiority
New Auto-Interp
Negative Logits
licity
-0.16
enco
-0.16
tol
-0.16
vid
-0.16
izzo
-0.15
iven
-0.15
adÃŃ
-0.14
QC
-0.14
_tol
-0.14
lish
-0.13
POSITIVE LOGITS
than
0.40
THAN
0.30
Than
0.30
than
0.29
Than
0.28
nor
0.27
_than
0.25
než
0.23
except
0.22
-than
0.22
Activations Density 0.036%