INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     concentração
    -0.08
     chimp
    -0.08
     Alc
    -0.08
     cid
    -0.08
     oss
    -0.08
    (Vec
    -0.08
    <|endoftext|>
    -0.08
    χεται
    -0.07
     Egyptian
    -0.07
    -0.07
    POSITIVE LOGITS
     don't
    0.08
    don
    0.07
    गे
    0.07
    mund
    0.07
     teamwork
    0.07
    POL
    0.07
     बाद
    0.07
     bung
    0.07
     क्योंकि
    0.07
    0.07
    Act Density 0.071%

    No Known Activations