INDEX
    Explanations

    contrasting relationships or distinctions between concepts

    instances of the word "but" indicating contrastive statements

    New Auto-Interp
    Negative Logits
    uters
    -0.79
    uter
    -0.77
    unction
    -0.76
    minent
    -0.74
    roy
    -0.74
    velt
    -0.72
    alty
    -0.72
    enter
    -0.71
    uther
    -0.71
    ct
    -0.70
    POSITIVE LOGITS
     nor
    0.99
     suffice
    0.92
     nevertheless
    0.85
     alas
    0.80
     rather
    0.78
     merely
    0.78
     luckily
    0.77
     hey
    0.76
     fortunately
    0.75
    chery
    0.75
    Act Density 0.105%

    No Known Activations