INDEX
Explanations
numerical statistics or data points related to performance metrics
New Auto-Interp
Negative Logits
499
-0.25
501
-0.23
335
-0.21
251
-0.20
332
-0.20
601
-0.20
502
-0.19
249
-0.19
334
-0.19
399
-0.18
POSITIVE LOGITS
667
0.32
857
0.28
714
0.25
571
0.25
167
0.24
429
0.24
833
0.23
333
0.23
286
0.23
750
0.22
Activations Density 0.016%