INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     opet
    -0.08
     episodes
    -0.07
    Parcel
    -0.07
     chut
    -0.07
     Chandigarh
    -0.07
     ambientes
    -0.07
     muddo
    -0.07
     banho
    -0.07
     orthodont
    -0.07
    .ua
    -0.07
    POSITIVE LOGITS
     ignoring
    0.09
     края
    0.09
    _ignore
    0.09
    0.09
    imwe
    0.09
    Ignore
    0.08
    ونية
    0.08
    Cum
    0.08
     الإنجليزية
    0.08
     maîtrise
    0.08
    Act Density 0.001%

    No Known Activations