INDEX
    Explanations

    words related to contrasting or stating exceptions

    the word "but" in various contexts

    New Auto-Interp
    Negative Logits
    roy
    -0.74
    ump
    -0.67
    ogical
    -0.65
    uto
    -0.65
    tnc
    -0.64
    ogo
    -0.63
    uly
    -0.63
    edu
    -0.61
    ands
    -0.61
    urdy
    -0.61
    POSITIVE LOGITS
    tons
    1.22
     alas
    1.05
     nevertheless
    0.98
    chery
    0.93
     fortunately
    0.91
     unfortunately
    0.89
     nonetheless
    0.89
     luckily
    0.88
    chers
    0.85
     preferably
    0.84
    Act Density 0.219%

    No Known Activations