INDEX
    Explanations

    actions performed by 'he'

    New Auto-Interp
    Negative Logits
     Him
    1.12
     Their
    1.06
     deras
    0.99
     theyre
    0.97
     Worst
    0.94
     mereka
    0.94
     Mr
    0.93
     Laser
    0.92
     Worth
    0.92
     Husband
    0.92
    POSITIVE LOGITS
     himself
    1.55
    his
    1.39
     his
    1.24
     seiner
    1.07
     seine
    1.05
    但他
    1.03
    了他的
    1.02
    他的
    0.95
     мог
    0.91
     kanyang
    0.88
    Act Density 0.016%

    No Known Activations