INDEX
    Explanations

    interrogative sentences ending in 'it' with an emphasis on high activation values

    rhetorical questions and conversational phrases

    New Auto-Interp
    Negative Logits
    furt
    -0.79
    umbn
    -0.66
    eric
    -0.64
    izont
    -0.63
    aeper
    -0.60
    orsi
    -0.60
    Stand
    -0.60
    ternity
    -0.59
    esm
    -0.57
    aneers
    -0.57
    POSITIVE LOGITS
    ?!
    0.95
    ?
    0.88
    !?
    0.82
    ??
    0.81
    ?!"
    0.79
     ?
    0.76
    ?'
    0.74
     adorable
    0.73
    !?"
    0.73
    ?"
    0.72
    Act Density 0.042%

    No Known Activations