INDEX
    Explanations

    words related to personal encounters or interactions

    references to the concept of 'experience.'

    New Auto-Interp
    Negative Logits
    hatt
    -0.79
    vous
    -0.76
    law
    -0.71
    ependent
    -0.70
    fam
    -0.68
     landsl
    -0.63
    yright
    -0.61
    sub
    -0.61
    yrics
    -0.60
    trap
    -0.60
    POSITIVE LOGITS
     Experience
    1.18
    Experience
    1.09
     experience
    1.04
     experiences
    1.01
    ttes
    0.87
    IENCE
    0.83
     experien
    0.82
    OWS
    0.80
    ually
    0.78
    iences
    0.78
    Act Density 0.025%

    No Known Activations