INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     apprent
    -0.08
    ivid
    -0.07
    Portrait
    -0.07
     GPA
    -0.07
     Identification
    -0.07
    تعاون
    -0.07
     observational
    -0.07
     Evangel
    -0.07
     stayed
    -0.07
    _VO
    -0.06
    POSITIVE LOGITS
     Może
    0.08
    ود
    0.08
    że
    0.07
     אחרות
    0.07
     humane
    0.07
    老妈
    0.07
     Đường
    0.07
    0.07
    ודות
    0.07
     annotations
    0.06
    Act Density 0.000%

    No Known Activations