INDEX
Negative Logits
Rewards
0.63
Reporting
0.58
\",
0.57
deen
0.57
else
0.57
?',
0.55
other
0.55
dagog
0.54
such
0.54
}}\|
0.54
POSITIVE LOGITS
francês
0.74
хими
0.69
Af
0.69
کل
0.66
animated
0.65
鏝
0.65
ني
0.64
য়ার
0.63
讃
0.63
ERR
0.63
Activations Density 0.000%