INDEX
Explanations
phrases indicating distance or progress towards a goal
phrases indicating a journey or progress towards a goal
New Auto-Interp
Negative Logits
pmwiki
-0.92
pecially
-0.73
ACTION
-0.66
ometimes
-0.65
Authors
-0.65
Guest
-0.64
bol
-0.63
cause
-0.62
IMAGES
-0.62
temp
-0.61
POSITIVE LOGITS
proving
0.76
toward
0.75
separating
0.75
figuring
0.73
explaining
0.70
overcoming
0.68
awaited
0.67
towards
0.67
izons
0.67
recovering
0.66
Activations Density 0.097%