INDEX
Explanations
code syntax and programming constructs
New Auto-Interp
Negative Logits
(__('-0.67
['$
-0.61
Weiss
-0.61
'@/
-0.61
Appel
-0.61
Hess
-0.60
éte
-0.59
FontWeight
-0.58
jina
-0.58
lobo
-0.58
POSITIVE LOGITS
↵
1.27
</tr>
1.13
↵↵
1.05
<eos>
0.99
[toxicity=0]
0.88
↵↵↵
0.88
↵↵↵↵↵
0.84
↵↵↵↵
0.79
())))
0.77
↵↵↵↵↵↵
0.76
Activations Density 0.451%