INDEX
Negative Logits
奭
0.41
idols
0.40
onyms
0.38
vänd
0.37
Sil
0.37
roses
0.37
invasions
0.37
ouses
0.37
傀
0.36
TG
0.36
POSITIVE LOGITS
етка
0.47
Rowe
0.43
Rockwell
0.41
Bhand
0.40
bap
0.39
Lips
0.38
Ⴊ
0.38
不在
0.38
чается
0.38
కాదు
0.37
Activations Density 0.000%