INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _comment
    -0.07
    _constant
    -0.06
    Nested
    -0.06
     Kar
    -0.06
    -0.06
    opro
    -0.06
    _description
    -0.06
    اهرة
    -0.06
     fundamental
    -0.06
    arrass
    -0.06
    POSITIVE LOGITS
    0.07
     Aud
    0.06
     uydu
    0.06
    0.06
    .:
    0.06
    Aud
    0.06
     swearing
    0.06
     vstup
    0.06
     jmé
    0.06
     ActionType
    0.06
    Act Density 0.014%

    No Known Activations