INDEX
Negative Logits
B
0.47
رب
0.46
were
0.45
לאחר
0.45
¿
0.44
Corr
0.43
belieb
0.43
هم
0.42
принад
0.42
utter
0.41
POSITIVE LOGITS
razioni
0.48
iation
0.45
Ꮘ
0.45
ậy
0.45
iato
0.44
лизация
0.43
kuma
0.42
larini
0.42
feasts
0.42
タニ
0.42
Activations Density 0.001%