INDEX
Negative Logits
Align
1.12
Im
0.91
Align
0.91
aligned
0.86
Alignment
0.85
Grounds
0.83
1
0.81
Divided
0.79
It
0.79
Those
0.79
POSITIVE LOGITS
1.36
第壹百
1.24
1.20
1.19
犃
1.19
spowod
1.16
昦
1.15
akibat
1.15
툐
1.12
pOt
1.12
Activations Density 0.000%