INDEX
Explanations
actions indicating change or movement
New Auto-Interp
Negative Logits
katika
-0.59
nella
-0.53
于
-0.52
於
-0.52
kwenye
-0.46
nell
-0.46
trong
-0.43
presso
-0.42
aronder
-0.41
nelle
-0.41
POSITIVE LOGITS
IN
0.82
in
0.79
进来
0.73
getIn
0.68
in
0.66
inn
0.66
IN
0.66
ins
0.64
进去
0.63
In
0.62
Activations Density 0.347%