INDEX
    Explanations

    quantifiers followed by us

    New Auto-Interp
    Negative Logits
     These
    0.56
    これらの
    0.52
    These
    0.50
     Эти
    0.48
    these
    0.48
     these
    0.46
     эти
    0.46
     thefe
    0.44
    Эти
    0.44
     quei
    0.43
    POSITIVE LOGITS
     us
    1.07
     everyone
    0.99
     нас
    0.95
    Everyone
    0.92
    everyone
    0.83
     everybody
    0.80
     Everyone
    0.80
    大家
    0.79
    ทุกคน
    0.77
     iedereen
    0.75
    Act Density 0.011%

    No Known Activations