INDEX
    Explanations

    expressions of regret or realizations of past mistakes

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.05
    2:0.00
    3:0.11
    4:0.09
    5:0.09
    6:0.04
    7:0.03
    8:0.30
    9:0.12
    10:0.01
    11:0.02
    Negative Logits
    api
    -1.64
     Alexa
    -1.63
    pour
    -1.52
    guide
    -1.52
     bots
    -1.51
     Miracle
    -1.49
    -1.49
    virt
    -1.49
    emouth
    -1.47
    tower
    -1.42
    POSITIVE LOGITS
     recalled
    1.71
     recol
    1.67
    commit
    1.67
     hindsight
    1.67
     cringe
    1.66
     commit
    1.66
     handwritten
    1.65
     reminis
    1.64
     recalling
    1.64
    rences
    1.61
    Act Density 0.001%

    No Known Activations