INDEX
    Explanations

    occurrences of the word "but" with a higher activation value than other tokens

    the conjunction "but" indicating contrasts or exceptions

    New Auto-Interp
    Negative Logits
     Cycle
    -0.68
     Excellence
    -0.62
    naire
    -0.55
     Improvement
    -0.54
     Procedures
    -0.53
    ĺħ
    -0.53
    ampa
    -0.53
     stimulus
    -0.52
    ESH
    -0.52
     pursu
    -0.51
    POSITIVE LOGITS
    chers
    1.39
    chery
    1.28
    tons
    1.14
    ts
    1.04
     alas
    0.97
    ted
    0.96
     nevertheless
    0.92
    ters
    0.92
     nonetheless
    0.90
    cher
    0.90
    Act Density 0.089%

    No Known Activations