INDEX
Explanations
prepositions and phrases indicating action or direction
New Auto-Interp
Negative Logits
ify
-0.19
nt
-0.18
nip
-0.18
t
-0.18
pedia
-0.18
nap
-0.17
ingly
-0.17
oretical
-0.16
rin
-0.15
rl
-0.15
POSITIVE LOGITS
wner
0.26
asters
0.22
ffee
0.21
ledo
0.21
asty
0.21
ppers
0.20
pline
0.20
eh
0.19
asts
0.19
/from
0.19
Activations Density 0.090%