INDEX
    Explanations

    modal verb followed by pronoun

    New Auto-Interp
    Negative Logits
    ance
    0.48
    ry
    0.46
    0.44
    swith
    0.43
    0.43
    器的
    0.42
    ôté
    0.42
    بعاد
    0.42
    +
    0.41
     carácter
    0.41
    POSITIVE LOGITS
     you
    1.11
     it
    1.04
     they
    1.01
     आपण
    0.96
     we
    0.96
    我們
    0.93
    我们
    0.86
     chúng
    0.84
    เรา
    0.84
     мы
    0.82
    Act Density 0.098%

    No Known Activations