INDEX
Explanations
instances of the word "to" followed by a verb
occurrences of the word "to" and its variations in context
New Auto-Interp
Negative Logits
pite
-0.78
amins
-0.75
irez
-0.68
owa
-0.68
hesda
-0.65
headers
-0.65
redits
-0.63
works
-0.63
incurred
-0.63
quartered
-0.63
POSITIVE LOGITS
him
0.89
us
0.77
them
0.70
inquire
0.69
reporters
0.69
me
0.65
psy
0.64
someone
0.63
talk
0.63
strangers
0.63
Activations Density 0.051%