INDEX
Explanations
directional terms and movement-related phrases
New Auto-Interp
Negative Logits
acha
-0.17
navr
-0.16
kers
-0.15
dob
-0.14
olie
-0.14
ousand
-0.14
eland
-0.14
KER
-0.14
utral
-0.14
thood
-0.14
POSITIVE LOGITS
wards
0.18
itches
0.15
911
0.14
LOCKS
0.14
AGMA
0.14
çe
0.14
Rap
0.14
пÑĥ
0.14
ward
0.14
幸
0.13
Activations Density 0.061%