INDEX
Explanations
phrases that indicate upward movement or progress
New Auto-Interp
Negative Logits
downhill
-0.32
downwards
-0.25
downward
-0.25
decreasing
-0.24
decreased
-0.23
Down
-0.22
down
-0.22
Decre
-0.22
Decre
-0.22
down
-0.21
POSITIVE LOGITS
rise
0.49
rises
0.44
rising
0.43
climb
0.42
ascent
0.42
risen
0.42
climbing
0.41
Rise
0.40
raise
0.40
-rise
0.40
Activations Density 0.221%