INDEX
Explanations
instances of the word "from" and phrases indicating choices or origins
New Auto-Interp
Negative Logits
mination
-0.84
matter
-0.84
mouth
-0.84
riter
-0.81
scient
-0.77
visor
-0.76
notice
-0.76
haw
-0.74
hai
-0.74
ended
-0.74
POSITIVE LOGITS
preset
0.90
among
0.89
assorted
0.87
amongst
0.86
various
0.81
whichever
0.80
options
0.79
afar
0.78
dozens
0.78
available
0.77
Activations Density 0.023%