INDEX
    Explanations

    instances of the word "but," signaling contrasts or exceptions in discourse

    New Auto-Interp
    Negative Logits
    /memory
    -0.16
    κÏĮ
    -0.15
    Ïĩο
    -0.14
    acs
    -0.14
     ourselves
    -0.14
    ızı
    -0.14
    pts
    -0.14
    elin
    -0.13
    .sax
    -0.13
    구
    -0.13
    POSITIVE LOGITS
    LER
    0.15
    forth
    0.14
    że
    0.14
    lbrace
    0.14
    /www
    0.14
    ipeg
    0.14
    oldt
    0.14
    tery
    0.14
    ERG
    0.14
    adan
    0.14
    Act Density 0.134%

    No Known Activations