INDEX
    Explanations

    references to plural pronouns, particularly "they."

    New Auto-Interp
    Negative Logits
     pomo
    -0.67
    を取る
    -0.58
     Paar
    -0.55
    mem
    -0.54
     Magenta
    -0.53
    almaz
    -0.53
     Staub
    -0.52
    біль
    -0.52
     mín
    -0.52
    fael
    -0.52
    POSITIVE LOGITS
     they
    2.21
    They
    2.01
    they
    1.96
     They
    1.95
    THEY
    1.95
     THEY
    1.88
     he
    1.59
    Their
    1.39
    Mereka
    1.35
     они
    1.34
    Act Density 0.094%

    No Known Activations