INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Musik
    -0.08
    okers
    -0.07
    юдж
    -0.06
     Face
    -0.06
    LOOD
    -0.06
     Matchers
    -0.06
    .ticket
    -0.06
    .Pixel
    -0.06
     Masks
    -0.06
     nghiệm
    -0.06
    POSITIVE LOGITS
     Col
    0.07
    Wei
    0.07
     stě
    0.07
     collar
    0.07
     intermediate
    0.06
    Deck
    0.06
     dne
    0.06
    _small
    0.06
    /tiny
    0.06
     hton
    0.06
    Act Density 0.024%

    No Known Activations