INDEX
    Explanations

    pronouns and questions related to identity and belonging

    New Auto-Interp
    Negative Logits
    uzey
    -0.16
    uku
    -0.16
    isque
    -0.15
    ones
    -0.15
    pose
    -0.14
    uess
    -0.14
    deny
    -0.14
    one
    -0.14
    WithOptions
    -0.14
    é¡į
    -0.13
    POSITIVE LOGITS
     she
    0.16
     THEY
    0.16
     HE
    0.16
    олÑİ
    0.15
     they
    0.15
    864
    0.14
    itis
    0.14
    830
    0.14
     WE
    0.14
     he
    0.13
    Act Density 0.114%

    No Known Activations