INDEX
    Explanations

    possessive pronouns related to the user

    New Auto-Interp
    Negative Logits
     himself
    -0.15
     themselves
    -0.15
    lights
    -0.15
    ald
    -0.15
    positories
    -0.14
    istra
    -0.14
    erty
    -0.14
    ói
    -0.14
    447
    -0.14
    ino
    -0.14
    POSITIVE LOGITS
     yourself
    0.23
    anmar
    0.21
    nger
    0.20
    essler
    0.19
     guys
    0.19
    opia
    0.17
    ths
    0.17
     Yourself
    0.17
    ’re
    0.16
    zon
    0.15
    Act Density 0.202%

    No Known Activations