INDEX
Explanations
phrases related to movement or progress
phrases indicating movement or progress towards a goal
New Auto-Interp
Negative Logits
uster
-0.76
ropolitan
-0.69
usters
-0.69
livest
-0.67
tein
-0.62
itton
-0.61
iasco
-0.60
Tuc
-0.60
icion
-0.60
nect
-0.59
POSITIVE LOGITS
fare
1.20
ward
0.95
finding
0.88
WARD
0.77
toward
0.77
step
0.76
steps
0.75
finder
0.73
towards
0.72
seeing
0.71
Activations Density 0.023%