INDEX
Explanations
prepositions followed by pronouns or nouns indicative of direction or transfer
New Auto-Interp
Negative Logits
natureconservancy
-0.82
tions
-0.81
Gap
-0.68
dn
-0.63
marine
-0.62
affili
-0.59
til
-0.58
NOT
-0.57
tion
-0.57
collar
-0.57
POSITIVE LOGITS
by
0.86
BY
0.82
by
0.77
ãĥ¯
0.70
ãĥīãĥ©
0.69
ãĥĨãĤ£
0.69
Lama
0.66
andom
0.66
que
0.66
å§«
0.65
Activations Density 0.215%