INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    man
    -0.08
    Park
    -0.07
    _race
    -0.07
    or
    -0.07
     Week
    -0.06
     HOW
    -0.06
    TextColor
    -0.06
    _boundary
    -0.06
     cuda
    -0.06
     zoo
    -0.06
    POSITIVE LOGITS
     its
    0.16
     Its
    0.16
    Its
    0.13
     ITS
    0.10
     itself
    0.08
    пис
    0.07
     It
    0.07
     Thi
    0.07
     it
    0.07
    0.07
    Act Density 0.109%

    No Known Activations