INDEX
Explanations
references to leaving a position or place
occurrences of the word "the" in various contexts related to positions or locations
New Auto-Interp
Negative Logits
GW
-0.75
MN
-0.71
fixes
-0.71
ML
-0.70
Cosponsors
-0.69
NESS
-0.66
Stats
-0.65
Brach
-0.65
Rail
-0.65
VEL
-0.63
POSITIVE LOGITS
undone
0.99
voic
0.86
untouched
0.85
unfinished
0.81
intact
0.81
unprotected
0.78
unatt
0.78
footprints
0.75
hyde
0.74
vacuum
0.73
Activations Density 0.131%