INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    </strong>
    0.89
    </h3>
    0.81
    </h6>
    0.77
     "");
    0.75
     rozpoczę
    0.74
     
    0.73
    ли
    0.72
    </h1>
    0.72
    </h5>
    0.72
    </code>
    0.71
    POSITIVE LOGITS
    Improve
    0.80
    0.78
    ر
    0.73
    改善
    0.73
    ب
    0.73
     amélior
    0.70
    improvement
    0.69
     Improve
    0.69
    اد
    0.68
    zmq
    0.66
    Act Density 0.044%

    No Known Activations