INDEX
    Explanations

    phrases related to personal stories or experiences

    mentions of personal experiences

    New Auto-Interp
    Negative Logits
    sub
    -0.73
    hatt
    -0.71
    vous
    -0.66
     tumor
    -0.64
    cut
    -0.62
    apo
    -0.62
    law
    -0.61
    tra
    -0.61
    cise
    -0.61
     Sabha
    -0.61
    POSITIVE LOGITS
     experiences
    1.22
     Experience
    0.97
    iences
    0.97
     experien
    0.95
     experience
    0.91
    Experience
    0.85
    ttes
    0.82
     Exper
    0.82
    OWS
    0.80
    ivities
    0.79
    Act Density 0.016%

    No Known Activations