INDEX
    Explanations

    pronouns and descriptions

    New Auto-Interp
    Negative Logits
    them
    0.56
     them
    0.47
    _$
    0.40
     THEM
    0.40
    Manuel
    0.39
     Them
    0.38
    Paul
    0.38
     উইল
    0.38
     őket
    0.37
     देम
    0.37
    POSITIVE LOGITS
     그는
    0.59
     he
    0.51
     তিনি
    0.51
     she
    0.48
     there
    0.48
     він
    0.48
     അദ്ദേഹം
    0.47
     почув
    0.47
     он
    0.46
     она
    0.46
    Act Density 0.000%

    No Known Activations