INDEX
Negative Logits
instruction
0.55
instructions
0.52
restrictions
0.51
צי
0.49
commanding
0.49
在
0.48
ים
0.48
domination
0.48
in
0.47
duplication
0.47
POSITIVE LOGITS
đừng
0.64
jangan
0.63
expect
0.59
ce
0.58
eské
0.57
underestimate
0.57
jangan
0.57
ėtų
0.55
長く
0.54
believe
0.53
Activations Density 0.057%