INDEX
    Explanations

    personal pronouns referring to oneself

    references to personal experience and identity

    New Auto-Interp
    Negative Logits
    ibaba
    -0.96
    pload
    -0.72
    ornia
    -0.68
    etheus
    -0.67
    ulton
    -0.67
    atlantic
    -0.65
    yson
    -0.64
    ģĸ
    -0.63
    otine
    -0.63
    odge
    -0.62
    POSITIVE LOGITS
    selves
    0.87
     dearly
    0.85
     personally
    0.79
    atically
    0.78
    terday
    0.78
    atic
    0.76
     uncond
    0.76
    self
    0.75
    Redditor
    0.72
    atics
    0.70
    Act Density 0.162%

    No Known Activations