INDEX
Negative Logits
ورية
0.42
集成
0.41
interplay
0.39
دليل
0.39
dil
0.38
d
0.38
பிரமி
0.38
свя
0.37
Tired
0.37
Granted
0.37
POSITIVE LOGITS
aligning
1.34
alignment
1.31
aligned
1.26
align
1.20
Alignment
1.20
Align
1.18
aligns
1.16
Align
1.10
align
1.09
alignment
1.09
Activations Density 0.007%