INDEX
    Explanations

    repetitive phrases or terms that suggest an additive structure in the text

    New Auto-Interp
    Negative Logits
     allerdings
    -0.17
    amen
    -0.17
     however
    -0.16
     and
    -0.15
    inho
    -0.13
     either
    -0.13
    esson
    -0.13
    nt
    -0.13
    atoes
    -0.13
     rd
    -0.13
    POSITIVE LOGITS
    ebek
    0.18
     importantly
    0.17
    /OR
    0.17
    acen
    0.16
    vice
    0.16
    æĿ¥è¯´
    0.15
    forth
    0.14
     vice
    0.14
    eyen
    0.14
    yor
    0.14
    Act Density 0.174%

    No Known Activations