INDEX
    Explanations

    contradictions or unexpected combinations within a sentence

    the word "yet" in various contexts, indicating contrast or exception

    New Auto-Interp
    Negative Logits
    rities
    -0.86
    tein
    -0.71
    regate
    -0.68
    edu
    -0.67
    ancial
    -0.66
    scribe
    -0.66
    rats
    -0.66
    gnu
    -0.64
    puters
    -0.63
    ilitating
    -0.63
    POSITIVE LOGITS
     somehow
    0.79
     again
    0.68
     manages
    0.67
     inexpl
    0.67
     strangely
    0.67
     mirac
    0.66
    heric
    0.66
     managed
    0.63
    forth
    0.63
     surpass
    0.63
    Act Density 0.023%

    No Known Activations