INDEX
    Explanations

    instances of the word "but," indicating a contrast or an exception in the text

    New Auto-Interp
    Negative Logits
     and
    -0.76
    himself
    -0.66
    herself
    -0.64
     THOUGH
    -0.63
    Though
    -0.62
     Though
    -0.58
     entanto
    -0.56
     Accordingly
    -0.56
    them
    -0.51
     bzw
    -0.50
    POSITIVE LOGITS
     then
    1.39
     hey
    1.18
     alas
    1.15
     unfortunately
    1.06
     also
    1.03
     it
    0.97
     luckily
    0.96
     yeah
    0.94
     if
    0.94
    chery
    0.93
    Act Density 0.160%

    No Known Activations