INDEX
    Explanations

    the word "but" and similar conjunctions indicating contrast or opposition

    New Auto-Interp
    Negative Logits
    uppe
    -0.17
    бо
    -0.16
    §
    -0.16
    olly
    -0.16
    ops
    -0.16
     therefore
    -0.16
     alike
    -0.15
    xies
    -0.15
    ziel
    -0.15
    beth
    -0.15
    POSITIVE LOGITS
    term
    0.24
    lers
    0.24
    chers
    0.24
    ressing
    0.24
    ler
    0.23
    ters
    0.22
    ressed
    0.22
    åĩ¡
    0.21
    rint
    0.21
    tpl
    0.21
    Act Density 0.048%

    No Known Activations