INDEX
Explanations
phrases indicating feelings, capabilities, and comparisons about quality and improvement
New Auto-Interp
Negative Logits
larger
-0.23
longer
-0.21
Longer
-0.20
narrower
-0.20
Larger
-0.20
heavier
-0.20
bigger
-0.19
harder
-0.18
smaller
-0.17
wider
-0.17
POSITIVE LOGITS
bet
0.40
BET
0.38
infinitely
0.31
better
0.30
Bett
0.29
bets
0.28
batter
0.28
MUCH
0.28
WAY
0.27
Much
0.27
Activations Density 0.160%