INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     i
    -0.08
     Suicide
    -0.07
    chemas
    -0.07
    (U
    -0.07
    AU
    -0.07
    compat
    -0.07
    avers
    -0.07
     Genius
    -0.07
    _connection
    -0.07
     BaseModel
    -0.06
    POSITIVE LOGITS
     jclass
    0.07
    ("..
    0.06
    phyl
    0.06
    ъек
    0.06
     jestli
    0.06
    otherwise
    0.06
     meant
    0.06
    .
    ↵
    ↵
    0.06
     داده
    0.06
     
    ↵
    ↵
    0.06
    Act Density 0.021%

    No Known Activations