INDEX
Explanations
phrases related to physical distance or separation
instances of the word "away"
New Auto-Interp
Negative Logits
efe
-0.70
Chung
-0.66
sshd
-0.65
ured
-0.62
Ellison
-0.62
emort
-0.61
Perkins
-0.61
ãĤ³
-0.60
Butterfly
-0.60
erity
-0.60
POSITIVE LOGITS
fitting
0.75
ments
0.72
coming
0.71
rent
0.69
leagues
0.68
world
0.68
away
0.68
fits
0.67
ILA
0.67
irts
0.66
Activations Density 0.047%