INDEX
    Explanations

    phrases expressing feelings of reward, satisfaction, and emotional responses to experiences

    New Auto-Interp
    Negative Logits
    legal
    -0.64
    Phill
    -0.62
    Newsletter
    -0.60
    Bridge
    -0.59
    Writer
    -0.59
    Roy
    -0.58
     Ashton
    -0.58
    hawks
    -0.58
     merger
    -0.57
    umar
    -0.56
    POSITIVE LOGITS
     yourself
    1.16
     yourselves
    0.98
     Yourself
    0.83
     wasting
    0.76
     temptation
    0.74
     your
    0.74
     wondering
    0.73
    ãĤ¦ãĤ¹
    0.72
     wiser
    0.71
     forgiven
    0.70
    Act Density 0.264%

    No Known Activations