INDEX
Negative Logits
unpleasant
0.39
उपन्यास
0.39
disagreeable
0.39
pleasures
0.38
PLAIN
0.38
philanthropic
0.37
ጇ
0.36
Ple
0.36
ент
0.35
臨界
0.35
POSITIVE LOGITS
помещение
0.43
ronaut
0.40
Лю
0.38
ருக
0.38
라우
0.38
माने
0.37
помещения
0.37
handleClick
0.37
eyse
0.37
वाहनों
0.36
Activations Density 0.000%