INDEX
    Explanations

    sentences with emotional or expressive language

    expressions of personal experience and opinion

    New Auto-Interp
    Negative Logits
    ggles
    -0.66
    otin
    -0.63
    eware
    -0.63
    oliberal
    -0.62
     dismant
    -0.60
    habi
    -0.60
    ctuary
    -0.58
     awa
    -0.58
    gae
    -0.57
    ependent
    -0.57
    POSITIVE LOGITS
     hadn
    1.03
    Had
    1.01
    Was
    1.00
     outwe
    0.98
     lacked
    0.94
     consisted
    0.94
     tended
    0.93
    didn
    0.92
     didn
    0.88
     smelled
    0.85
    Act Density 1.455%

    No Known Activations