INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     dati
    -0.07
    _ASC
    -0.06
    _ser
    -0.06
     эффектив
    -0.06
     Sergeant
    -0.06
    Navigator
    -0.06
    -orange
    -0.06
    СТ
    -0.06
    -0.06
     yok
    -0.06
    POSITIVE LOGITS
    まり
    0.07
     impaired
    0.07
    alse
    0.06
     unhappy
    0.06
     Avengers
    0.06
     Seam
    0.06
    Scalars
    0.06
     poorer
    0.06
     venture
    0.06
    adders
    0.06
    Act Density 0.014%

    No Known Activations