INDEX
    Explanations

    introspective statements and self-reflection

    New Auto-Interp
    Negative Logits
    dding
    -0.66
    mma
    -0.62
    ichever
    -0.61
    cknow
    -0.61
    noticed
    -0.59
    herent
    -0.58
     Friendly
    -0.57
     Columb
    -0.56
    tted
    -0.55
     EW
    -0.55
    POSITIVE LOGITS
     entails
    1.06
    alian
    0.92
     boils
    0.91
     hurts
    0.86
     happened
    0.85
     transpired
    0.85
     happens
    0.84
     feels
    0.83
     takes
    0.80
     rains
    0.80
    Act Density 0.081%

    No Known Activations