INDEX
    Explanations

    actions directed at them

    New Auto-Interp
    Negative Logits
    Their
    1.62
     Their
    1.57
     Its
    1.42
     their
    1.38
    their
    1.37
     leurs
    1.35
    Its
    1.34
     leur
    1.30
     deres
    1.29
     ihre
    1.25
    POSITIVE LOGITS
     them
    2.56
    them
    2.17
     ones
    2.08
     Them
    1.82
     них
    1.71
     அவற்றை
    1.66
     देम
    1.63
     THEM
    1.62
     ними
    1.54
     সেগুলি
    1.51
    Act Density 0.405%

    No Known Activations