INDEX
    Explanations

    sentences indicating understanding or comprehension

    expressions of comprehension or understanding

    New Auto-Interp
    Negative Logits
    DOWN
    -0.66
    mage
    -0.65
    aunder
    -0.64
    rouse
    -0.63
    izons
    -0.62
    artney
    -0.62
    patch
    -0.61
     psychiat
    -0.60
    ield
    -0.60
    onding
    -0.60
    POSITIVE LOGITS
     myself
    0.94
     Citation
    0.63
    ANA
    0.61
     firsthand
    0.60
     Kahn
    0.59
     count
    0.58
     reiterate
    0.58
    poke
    0.57
     regrets
    0.57
     exaggeration
    0.56
    Act Density 0.371%

    No Known Activations