INDEX
Explanations
comparative phrases indicating differences
comparisons between entities or concepts
New Auto-Interp
Negative Logits
gro
-0.54
(>
-0.53
ustain
-0.53
spr
-0.53
itans
-0.53
ources
-0.52
Deadline
-0.51
ioch
-0.51
aturday
-0.51
ilda
-0.51
POSITIVE LOGITS
differently
2.03
different
1.86
different
1.74
similar
1.53
opposite
1.46
Different
1.43
worse
1.35
identical
1.33
similarly
1.31
simpler
1.30
Activations Density 0.970%