INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     homosex
    -0.07
     intimidated
    -0.07
     SPORT
    -0.06
    cstdint
    -0.06
    _requested
    -0.06
     Jehovah
    -0.06
    -0.06
    _density
    -0.06
    american
    -0.06
     Spin
    -0.06
    POSITIVE LOGITS
    .Man
    0.07
     carrot
    0.07
     occurring
    0.07
    ="/
    0.06
     lời
    0.06
    CodeGen
    0.06
    .optional
    0.06
    izen
    0.06
    hattan
    0.06
    affen
    0.06
    Act Density 0.035%

    No Known Activations