INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     her
    1.11
     ей
    0.99
     её
    0.91
     ее
    0.90
     її
    0.89
     její
    0.87
     haar
    0.86
    她的
    0.84
     그녀
    0.83
     яе
    0.80
    POSITIVE LOGITS
    ienen
    0.91
    ]
    0.88
     mereka
    0.88
    ][
    0.87
    They
    0.82
    they
    0.80
    ”]
    0.80
    ](
    0.79
    ],
    0.79
    mselves
    0.78
    Act Density 0.028%

    No Known Activations