INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    manual
    -0.07
     Fixed
    -0.07
    _DEL
    -0.07
     trùng
    -0.07
     vulgar
    -0.06
    -0.06
     Carm
    -0.06
     VARIABLE
    -0.06
     ecstatic
    -0.06
    gregar
    -0.06
    POSITIVE LOGITS
    phot
    0.07
     slash
    0.06
    mallow
    0.06
     bütün
    0.06
    udev
    0.06
     sacrificed
    0.06
    0.06
     ไทย
    0.06
    0.06
    ;base
    0.06
    Act Density 0.021%

    No Known Activations