INDEX
    Explanations

    numerical statistics or data points related to performance metrics

    New Auto-Interp
    Negative Logits
    499
    -0.25
    501
    -0.23
    335
    -0.21
    251
    -0.20
    332
    -0.20
    601
    -0.20
    502
    -0.19
    249
    -0.19
    334
    -0.19
    399
    -0.18
    POSITIVE LOGITS
    667
    0.32
    857
    0.28
    714
    0.25
    571
    0.25
    167
    0.24
    429
    0.24
    833
    0.23
    333
    0.23
    286
    0.23
    750
    0.22
    Act Density 0.016%

    No Known Activations