INDEX
Explanations
phrases indicating direction or intention
the preposition "to" and variations of its usage
New Auto-Interp
Negative Logits
meant
-0.57
enough
-0.53
stru
-0.53
bruising
-0.53
messing
-0.53
coupled
-0.52
glitches
-0.51
compared
-0.51
vulner
-0.51
typ
-0.50
POSITIVE LOGITS
ggles
1.44
wered
1.33
ilet
1.18
asted
1.16
ppers
1.15
asts
1.10
pless
1.07
pper
1.06
accompany
1.04
asting
1.04
Activations Density 0.172%