INDEX
Negative Logits
ᠣ
0.42
megfe
0.41
があります
0.40
がございます
0.39
faint
0.39
comerci
0.38
ừng
0.38
uniti
0.38
acquainted
0.37
unin
0.37
POSITIVE LOGITS
allowed
1.44
allowed
1.23
Allowed
1.22
permitted
1.18
Allowed
1.07
允许
1.00
prohibited
1.00
disallowed
0.99
允許
0.99
permitido
0.99
Activations Density 0.045%