INDEX
    Explanations

    emotional responses and actions related to care and interpersonal relationships

    New Auto-Interp
    Negative Logits
    她们
    -0.20
    å®ĥ们
    -0.18
     THEY
    -0.15
    uada
    -0.15
    usi
    -0.14
    ategorized
    -0.14
     вони
    -0.14
    okers
    -0.14
    -fw
    -0.13
    uchos
    -0.13
    POSITIVE LOGITS
     him
    1.55
    him
    1.09
     lui
    1.05
     Him
    0.98
     ihn
    0.93
     ihm
    0.91
     HIM
    0.79
     него
    0.74
     емÑĥ
    0.69
     немÑĥ
    0.67
    Act Density 0.978%

    No Known Activations