INDEX
    Explanations

    expressions of personal identity and self-reference

    New Auto-Interp
    Negative Logits
     ourselves
    -0.79
     our
    -0.72
     we
    -0.65
     Our
    -0.63
     OUR
    -0.63
     nossos
    -0.59
     their
    -0.59
     bizi
    -0.57
    Our
    -0.57
    我们的
    -0.55
    POSITIVE LOGITS
     myself
    0.80
    EndContext
    0.76
     I
    0.75
     my
    0.74
    Tôi
    0.73
     moje
    0.72
    myself
    0.71
    讓我
    0.70
    私の
    0.70
     אני
    0.69
    Act Density 0.125%

    No Known Activations