INDEX
    Explanations

    actions performed by himself

    New Auto-Interp
    Negative Logits
    your
    0.64
     yourselves
    0.63
     your
    0.61
    Your
    0.57
     yourself
    0.56
     him
    0.55
     lui
    0.53
     iyong
    0.53
    0.53
    yourself
    0.52
    POSITIVE LOGITS
     himself
    1.48
     తన
    1.16
     തന്റെ
    1.13
     نفسه
    1.00
     his
    0.97
     Himself
    0.93
     자신의
    0.92
     தனது
    0.92
     ತನ್ನ
    0.86
     своему
    0.83
    Act Density 0.015%

    No Known Activations