INDEX
    Explanations

    actions directed at them

    New Auto-Interp
    Negative Logits
    mselves
    0.50
     zichzelf
    0.47
    自己在
    0.40
     eponymous
    0.40
    0.40
     fazem
    0.40
     sich
    0.38
     fanno
    0.38
    自己
    0.37
     Pilgrim
    0.37
    POSITIVE LOGITS
     them
    2.03
     ihnen
    2.02
     त्यांना
    1.98
    them
    1.88
    他们
    1.85
     тях
    1.85
     அவர்களை
    1.84
     onların
    1.84
     них
    1.82
    他們
    1.82
    Act Density 0.016%

    No Known Activations