INDEX
Explanations
phrases indicating direction or movement
New Auto-Interp
Negative Logits
umba
-0.16
smith
-0.15
acomp
-0.15
shim
-0.14
Preconditions
-0.14
Forge
-0.14
Noel
-0.14
Beled
-0.14
ted
-0.14
eda
-0.14
POSITIVE LOGITS
Å®
0.15
.postMessage
0.15
имв
0.15
گرد
0.14
Barton
0.14
ãĥ¼ãĥĨ
0.14
725
0.13
365
0.13
PUB
0.13
Mim
0.13
Activations Density 0.135%