INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ape
    -0.06
    osemite
    -0.06
    Date
    -0.06
    angs
    -0.06
     denounced
    -0.06
    lst
    -0.06
    Feed
    -0.06
     CAT
    -0.06
    imizer
    -0.06
    achers
    -0.06
    POSITIVE LOGITS
    0.07
     LinearGradient
    0.06
    (stderr
    0.06
    ível
    0.06
     Pri
    0.06
     سرمایه
    0.06
     hardships
    0.06
    /send
    0.06
     pracov
    0.06
    _trait
    0.06
    Act Density 0.021%

    No Known Activations