INDEX
Explanations
prepositions
instances of the word "to."
New Auto-Interp
Negative Logits
don
-0.76
differed
-0.74
irez
-0.73
Written
-0.72
didn
-0.68
ended
-0.67
^^
-0.66
didn
-0.66
rar
-0.65
puff
-0.64
POSITIVE LOGITS
downright
0.88
outright
0.85
encompass
0.80
pless
0.73
asted
0.73
adulthood
0.71
DonaldTrump
0.70
wered
0.68
ensure
0.68
ilet
0.67
Activations Density 0.069%