INDEX
    Explanations

    questions beginning with "Why."

    rhetorical questions that challenge reasoning or assumptions

    New Auto-Interp
    Negative Logits
    interstitial
    -0.66
    è¦ļéĨĴ
    -0.65
    ILY
    -0.63
    eki
    -0.60
    unal
    -0.60
    aukee
    -0.60
    iece
    -0.60
    ipes
    -0.59
    apsed
    -0.59
    tips
    -0.59
    POSITIVE LOGITS
     bother
    1.19
     shouldn
    1.07
     wouldn
    1.05
     aren
    1.04
     didn
    1.01
     did
    1.00
     does
    0.99
     hasn
    0.98
     don
    0.95
     should
    0.93
    Act Density 0.044%

    No Known Activations