INDEX
Negative Logits
trifling
0.49
Pages
0.44
hark
0.43
population
0.42
Number
0.41
population
0.41
denial
0.40
ุ
0.40
den
0.40
bit
0.40
POSITIVE LOGITS
摶
0.58
duckys
0.53
奀
0.50
gà
0.50
tede
0.50
стаў
0.50
tempHeader
0.49
ستم
0.49
órica
0.49
iselt
0.49
Activations Density 0.001%