INDEX
Explanations
instances of the word "to" followed by a verb
instances of the word "to" indicating directional action or intentions
New Auto-Interp
Negative Logits
selves
-0.90
enance
-0.66
entit
-0.64
nav
-0.64
fortunately
-0.64
terday
-0.63
eanor
-0.63
angered
-0.63
heast
-0.63
ankind
-0.61
POSITIVE LOGITS
bed
0.99
pless
0.96
extremes
0.95
lengths
0.93
jail
0.91
sleep
0.90
hell
0.77
grips
0.77
warp
0.75
bat
0.75
Activations Density 0.119%