INDEX
    Explanations

    phrases related to personal experiences, opinions, and actions in various situations

    New Auto-Interp
    Negative Logits
    selves
    -0.87
     unison
    -0.82
    hub
    -0.79
     respective
    -0.69
     respectively
    -0.63
    merce
    -0.58
     Authors
    -0.58
     ourselves
    -0.57
    mination
    -0.57
     Helpful
    -0.57
    POSITIVE LOGITS
     himself
    1.77
     Himself
    1.19
     his
    1.15
     herself
    1.01
     charisma
    0.80
     personally
    0.80
     subordinates
    0.79
     persona
    0.76
     wife
    0.76
    His
    0.74
    Act Density 4.563%

    No Known Activations