INDEX
    Explanations

    positive experiences and transformation through challenging situations

    New Auto-Interp
    Negative Logits
     affor
    -1.18
     increa
    -1.08
     kani
    -1.07
     Simult
    -1.04
     embodi
    -1.03
     fta
    -1.02
     volunte
    -1.00
     PLW
    -1.00
     haup
    -0.99
     unlaw
    -0.98
    POSITIVE LOGITS
     thought
    0.72
    thought
    0.71
     thinking
    0.65
     knew
    0.64
    <bos>
    0.61
     assumed
    0.60
     hadn
    0.58
     wasn
    0.58
     initially
    0.58
     myself
    0.57
    Act Density 0.645%

    No Known Activations