INDEX
    Explanations

    actions related to teaching, assisting, and supporting others

    New Auto-Interp
    Negative Logits
     ours
    -0.07
    arest
    -0.07
    rax
    -0.07
    ebin
    -0.07
    siz
    -0.06
    ÑĪий
    -0.06
    ’S
    -0.06
    ieur
    -0.06
    irket
    -0.06
    ambre
    -0.06
    POSITIVE LOGITS
     them
    0.18
     their
    0.16
    ä»ĸ们
    0.14
     they
    0.14
     ihnen
    0.14
    ä»ĸåĢij
    0.13
     иÑħ
    0.13
    their
    0.13
     loro
    0.12
    them
    0.12
    Act Density 0.086%

    No Known Activations