INDEX
    Explanations

    conjunctions and variations of the word "but."

    New Auto-Interp
    Negative Logits
    бо
    -0.17
    ziel
    -0.17
    strcasecmp
    -0.17
    olly
    -0.16
    ï¸ı
    -0.15
     alike
    -0.15
    pping
    -0.15
    beth
    -0.15
    pon
    -0.15
    xies
    -0.14
    POSITIVE LOGITS
    term
    0.25
    lers
    0.24
    ters
    0.24
    rint
    0.24
    chers
    0.23
    ler
    0.23
    ressed
    0.23
    ts
    0.23
    åĩ¡
    0.22
    ressing
    0.22
    Act Density 0.047%

    No Known Activations