INDEX
    Explanations

    instances of the word "up."

    New Auto-Interp
    Negative Logits
    nist
    -0.17
    onne
    -0.16
    uss
    -0.16
    edom
    -0.15
    ighbor
    -0.15
    zas
    -0.14
    nal
    -0.14
     нам
    -0.14
    eb
    -0.14
    ersen
    -0.13
    POSITIVE LOGITS
    oids
    0.16
    alim
    0.15
    mps
    0.14
    Mixin
    0.14
    062
    0.14
     with
    0.14
    065
    0.14
    TOTYPE
    0.14
    ç̬
    0.13
    idity
    0.13
    Act Density 0.007%

    No Known Activations