INDEX
Explanations
phrases related to progress or advancement
phrases related to progress or actions taken towards improvement
New Auto-Interp
Negative Logits
ecause
-0.80
inately
-0.78
ores
-0.77
iqueness
-0.75
licts
-0.75
rums
-0.75
eatures
-0.74
nces
-0.72
licted
-0.71
liction
-0.70
POSITIVE LOGITS
forward
1.22
toward
1.09
towards
1.07
backward
1.00
daughter
0.98
forwards
0.97
steps
0.93
backwards
0.92
step
0.91
Forward
0.84
Activations Density 0.036%