INDEX
    Explanations

    phrases related to social dynamics and interpersonal relationships

    New Auto-Interp
    Negative Logits
    them
    -0.71
    Them
    -0.68
    selves
    -0.67
    ſelves
    -0.61
     hennes
    -0.61
    herself
    -0.58
     Them
    -0.58
    給我
    -0.56
    Him
    -0.56
     Yourself
    -0.55
    POSITIVE LOGITS
     we
    1.58
     they
    1.52
     that
    1.38
     you
    1.25
     he
    1.17
     she
    1.03
     everyone
    0.89
     mà
    0.87
     someone
    0.84
     mình
    0.83
    Act Density 1.215%

    No Known Activations