INDEX
Explanations
verbs and phrases indicating action or involvement
New Auto-Interp
Negative Logits
Ont
-0.20
ONT
-0.19
okud
-0.17
Ont
-0.17
onto
-0.17
ÙģÙĪÙĤ
-0.16
bes
-0.16
опÑĢи
-0.16
anova
-0.15
sebou
-0.14
POSITIVE LOGITS
on
0.49
upon
0.28
на
0.26
عÙĦÙī
0.23
på
0.21
pada
0.21
on
0.20
on
0.19
.on
0.19
ona
0.19
Activations Density 0.325%