INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     on
    1.25
     for
    1.23
    0
    1.16
    the
    1.12
    t
    1.07
    ted
    1.06
    1.02
     or
    1.02
     at
    0.99
     if
    0.95
    POSITIVE LOGITS
    1.48
    ون
    1.41
    ية
    1.34
    را
    1.34
    1.33
    و
    1.30
    λή
    1.30
    اد
    1.26
    وين
    1.23
    কে
    1.20
    Act Density 0.006%

    No Known Activations