INDEX
Negative Logits
P
0.51
crim
0.47
criminally
0.46
con
0.43
,
0.43
as
0.42
over
0.42
g
0.42
crime
0.42
criminal
0.41
POSITIVE LOGITS
irrahim
0.50
溘
0.50
وهذه
0.48
睄
0.45
yanlı
0.41
యో
0.41
ума
0.41
ванию
0.41
暐
0.41
இவை
0.40
Activations Density 0.004%