INDEX
    Explanations

    references to self-affirmation and personal identity

    New Auto-Interp
    Negative Logits
     their
    -0.74
     leurs
    -0.68
     ihre
    -0.64
    glected
    -0.63
     ihrer
    -0.61
    在我的
    -0.58
     Roskov
    -0.54
    Their
    -0.52
    Diwedd
    -0.52
     cherchés
    -0.52
    POSITIVE LOGITS
     yourself
    2.44
     Yourself
    2.04
     YOURSELF
    1.83
    yourself
    1.79
    Yourself
    1.54
     thyself
    1.47
     yourselves
    1.38
     oneself
    1.15
     himſelf
    1.07
     itſelf
    1.02
    Act Density 0.076%

    No Known Activations