INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     presenta
    -0.08
     그래서
    -0.07
     kraje
    -0.07
     oferta
    -0.07
     resurgence
    -0.06
     ق
    -0.06
     combust
    -0.06
     Những
    -0.06
     fantas
    -0.06
     έχουν
    -0.06
    POSITIVE LOGITS
     namely
    0.08
    HEME
    0.07
     refers
    0.07
    mail
    0.07
     Lisa
    0.06
    ARGS
    0.06
    چی
    0.06
    mund
    0.06
    ACY
    0.06
    0.06
    Act Density 0.006%

    No Known Activations