INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    odnev
    0.22
     कहती
    0.22
     Amended
    0.22
     tačiau
    0.21
    ysteine
    0.21
     its
    0.21
    يارات
    0.20
     illetve
    0.20
     নিজেদের
    0.20
     svojih
    0.19
    POSITIVE LOGITS
     himself
    0.60
    ගේ
    0.34
     نفسه
    0.33
     Himself
    0.32
     hims
    0.30
    和他
    0.30
    ův
    0.28
     Jr
    0.28
     करतो
    0.28
     мог
    0.27
    Act Density 0.029%

    No Known Activations