INDEX
Explanations
phrases indicating proximity or nearness
New Auto-Interp
Negative Logits
ãĥ¥
-0.18
енко
-0.16
ulong
-0.16
edb
-0.16
edo
-0.15
urator
-0.15
gan
-0.15
gregator
-0.15
моÑĤÑĢеÑĤÑĮ
-0.15
eff
-0.15
POSITIVE LOGITS
ness
0.27
shore
0.27
ctic
0.26
abouts
0.26
misses
0.25
by
0.25
Äijây
0.24
lier
0.24
s
0.24
-term
0.24
Activations Density 0.028%