INDEX
Explanations
references to walking or movement
repeated instances of the word "walk" in various forms
New Auto-Interp
Negative Logits
encies
-0.78
iled
-0.68
afort
-0.66
nan
-0.65
iling
-0.65
nces
-0.65
igne
-0.63
cffff
-0.62
orporated
-0.62
iller
-0.61
POSITIVE LOGITS
through
1.01
about
0.91
upright
0.90
away
0.89
ways
0.86
uphill
0.80
onstage
0.79
bare
0.78
way
0.76
bow
0.75
Activations Density 0.032%