INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     your
    -1.48
    your
    -1.33
     yourselves
    -1.13
     yourself
    -1.09
     yours
    -1.09
    Your
    -1.07
     youre
    -1.07
    yours
    -0.98
    ของคุณ
    -0.98
    你的
    -0.91
    POSITIVE LOGITS
     their
    0.96
     Their
    0.93
    Their
    0.88
    their
    0.78
    themselves
    0.75
     themselves
    0.73
     ihre
    0.71
     ihren
    0.70
     ihrer
    0.68
     THEIR
    0.65
    Act Density 0.049%

    No Known Activations