INDEX
Negative Logits
behaviors
0.49
perilaku
0.47
atypical
0.46
comportamenti
0.46
berkualitas
0.44
nedeniyle
0.43
crackdown
0.43
PLATES
0.43
kisah
0.42
straordin
0.41
POSITIVE LOGITS
centre
0.51
cả
0.48
Cái
0.47
všech
0.46
óry
0.45
rightly
0.44
วจ
0.44
fw
0.44
łada
0.43
both
0.43
Activations Density 0.020%