INDEX
Explanations
phrases related to progress or advancement
references to progress or advancements measured in steps
New Auto-Interp
Negative Logits
ecause
-0.82
liction
-0.77
eer
-0.72
nces
-0.70
elta
-0.67
licts
-0.67
iqueness
-0.65
inately
-0.63
licted
-0.63
egu
-0.63
POSITIVE LOGITS
forward
1.29
backward
1.18
backwards
1.11
forwards
1.09
toward
1.07
towards
1.04
closer
0.99
Forward
0.99
frog
0.99
forward
0.96
Activations Density 0.023%