INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     HIM
    -0.08
     надеж
    -0.08
     Cunningham
    -0.08
     Finder
    -0.07
     மக்கள்
    -0.07
     zuerst
    -0.07
    -0.07
     먼저
    -0.07
     Lut
    -0.07
    division
    -0.07
    POSITIVE LOGITS
    sat
    0.09
     пись
    0.09
    SHA
    0.08
     predis
    0.08
    Clinic
    0.08
     undertaken
    0.08
     antidepress
    0.08
     outpatient
    0.08
     torture
    0.08
    orship
    0.08
    Act Density 0.002%

    No Known Activations