INDEX
Explanations
directional and movement-related language
New Auto-Interp
Negative Logits
ElementException
-0.18
ÅĻÃŃd
-0.16
udd
-0.15
ingly
-0.15
uzey
-0.14
foy
-0.14
ëĬ
-0.14
gi
-0.14
ingo
-0.13
eniable
-0.13
POSITIVE LOGITS
wards
0.24
ward
0.21
onto
0.21
into
0.20
WARD
0.17
onto
0.16
oward
0.16
towards
0.15
toward
0.15
Into
0.15
Activations Density 0.158%