INDEX
    Explanations

    references to the second person "you" in various contexts

    New Auto-Interp
    Negative Logits
     their
    -0.50
    -0.48
    她们
    -0.46
    Their
    -0.45
     Leurs
    -0.45
    她們
    -0.44
     leurs
    -0.44
     Their
    -0.42
    他們的
    -0.41
    他们的
    -0.39
    POSITIVE LOGITS
     guys
    1.54
     yourself
    1.32
     yourselves
    1.02
    guys
    0.96
    yourself
    0.94
    tubers
    0.94
     GUYS
    0.90
    Guys
    0.85
     Guys
    0.82
     Yourself
    0.82
    Act Density 0.199%

    No Known Activations