INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Proz
    -0.10
    heres
    -0.08
    ifestyles
    -0.08
    openh
    -0.08
     sophist
    -0.07
     HOM
    -0.07
    اتهم
    -0.07
     touted
    -0.07
     edific
    -0.07
    േള
    -0.07
    POSITIVE LOGITS
    0.08
     difer
    0.08
    -gl
    0.07
     indirectly
    0.07
     Poor
    0.07
    0.07
     ganado
    0.07
     despi
    0.07
    0.07
     بھر
    0.07
    Act Density 0.003%

    No Known Activations