INDEX
Explanations
references to ideal conditions or scenarios
New Auto-Interp
Negative Logits
[
-0.73
…
-0.69
-0.67
[
-0.64
(
-0.63
ET
-0.61
–
-0.61
<eos>
-0.58
….
-0.58
...
-0.57
POSITIVE LOGITS
ideal
2.16
ideal
2.14
Ideal
2.09
IDEAL
2.08
Ideal
2.05
idéal
1.82
idéale
1.80
ideale
1.78
ideales
1.65
理想
1.50
Activations Density 0.064%