INDEX
    Explanations

    references to caregiving, responsibility, and community interactions

    New Auto-Interp
    Negative Logits
    ente
    -0.16
    ovalo
    -0.15
    iam
    -0.15
     <$>
    -0.15
    iami
    -0.14
    IPH
    -0.14
    اÙĪØ±
    -0.14
    ekli
    -0.14
    entes
    -0.14
    jin
    -0.13
    POSITIVE LOGITS
     him
    0.46
     she
    0.44
     his
    0.37
    她
    0.34
     her
    0.34
    /she
    0.33
     he
    0.32
    she
    0.32
     herself
    0.32
     s
    0.32
    Act Density 0.599%

    No Known Activations