INDEX
    Explanations

    references to personal identity and relationships

    New Auto-Interp
    Negative Logits
    "]').
    -0.68
     "").
    -0.62
     })}
    -0.62
    ",(
    -0.60
     "))
    -0.58
    目は
    -0.57
    ())).
    -0.56
    》.
    -0.56
    ",{
    -0.56
    ").
    -0.56
    POSITIVE LOGITS
     YOURSELF
    0.96
    Myself
    0.90
     ourselves
    0.89
    Yourself
    0.88
     Myself
    0.88
    selves
    0.87
    myself
    0.86
     Yourself
    0.86
     comigo
    0.80
     yourself
    0.78
    Act Density 0.221%

    No Known Activations