INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cry
    -0.06
    ิลล
    -0.06
     Patterns
    -0.06
    din
    -0.06
    -0.06
    .fx
    -0.06
    fe
    -0.06
     correct
    -0.06
     одно
    -0.06
    üyorum
    -0.06
    POSITIVE LOGITS
     severity
    0.11
    severity
    0.10
     Severity
    0.08
    assistant
    0.08
     كر
    0.07
     зап
    0.07
    Rem
    0.06
     гром
    0.06
    ixels
    0.06
    _leader
    0.06
    Act Density 0.014%

    No Known Activations