INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    roid
    -0.07
     Brain
    -0.07
     جوان
    -0.06
    quivos
    -0.06
     appropri
    -0.06
     thumb
    -0.06
     Caul
    -0.06
     Mare
    -0.06
     erroneous
    -0.06
    عار
    -0.06
    POSITIVE LOGITS
    카라
    0.07
     Sect
    0.06
     svc
    0.06
     NG
    0.06
     belongings
    0.06
     Check
    0.06
     "`
    0.06
     pyt
    0.06
    Translations
    0.06
     knack
    0.06
    Act Density 0.541%

    No Known Activations