INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     regul
    -0.07
     Kens
    -0.06
     Ion
    -0.06
    <boolean
    -0.06
     =================================================
    -0.06
     joint
    -0.06
     race
    -0.06
     Ski
    -0.06
     suffer
    -0.06
    osta
    -0.06
    POSITIVE LOGITS
    itled
    0.07
    ecut
    0.06
    τικές
    0.06
    0.06
    ویش
    0.06
     разом
    0.06
    velle
    0.06
    news
    0.06
    องท
    0.06
    printf
    0.06
    Act Density 0.000%

    No Known Activations