INDEX
Explanations
phrases indicating conditions or stipulations related to actions or situations
New Auto-Interp
Negative Logits
cum
-0.15
UBY
-0.15
orman
-0.15
agnost
-0.15
emark
-0.14
окон
-0.14
427
-0.14
unda
-0.14
okrat
-0.13
asmus
-0.13
POSITIVE LOGITS
\Bridge
0.16
oup
0.16
سÙĦاÙħ
0.16
ibar
0.15
Copp
0.15
bic
0.15
ancestral
0.15
yntax
0.15
indle
0.14
Ñľ
0.14
Activations Density 0.008%