INDEX
Explanations
phrases indicating uncertainty or indecision
New Auto-Interp
Negative Logits
инов
-0.17
گاÙĨ
-0.17
illes
-0.16
takdir
-0.15
UNSIGNED
-0.15
illas
-0.14
ught
-0.14
urus
-0.14
_tF
-0.14
alc
-0.14
POSITIVE LOGITS
ruled
0.94
ruling
0.89
rule
0.87
Rule
0.79
rule
0.75
Rule
0.73
-rule
0.69
_rule
0.66
RULE
0.64
.rule
0.63
Activations Density 0.255%