INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _owned
    -0.07
     girl
    -0.06
    بینی
    -0.06
    cakes
    -0.06
     опис
    -0.06
    _alert
    -0.06
     MEMBER
    -0.06
     ADVISED
    -0.06
     sinks
    -0.06
     licensed
    -0.06
    POSITIVE LOGITS
     süreç
    0.07
     الوقت
    0.07
     натураль
    0.07
     wyst
    0.07
     лі
    0.07
     ủy
    0.06
     Voj
    0.06
     поє
    0.06
     spolup
    0.06
     риз
    0.06
    Act Density 0.002%

    No Known Activations