INDEX
Explanations
actions followed by prepositions
New Auto-Interp
Negative Logits
そして
1.06
그리고
1.02
(!)
1.00
그리고
0.99
หรือ
0.98
或者
0.96
или
0.95
或者是
0.94
وم
0.92
或其他
0.92
POSITIVE LOGITS
via
1.37
with
1.29
through
1.26
from
1.23
on
1.18
within
1.17
alongside
1.12
across
1.12
without
1.11
against
1.11
Activations Density 0.381%