INDEX
    Explanations

    the word "We" and related forms, indicating a focus on collective or shared experiences

    New Auto-Interp
    Negative Logits
     Thon
    -0.77
     Chy
    -0.77
    WithIOException
    -0.75
    Chy
    -0.69
     pathlib
    -0.65
     Salat
    -0.64
     Padang
    -0.64
     fior
    -0.63
     Monfieur
    -0.62
     Ueber
    -0.62
    POSITIVE LOGITS
     We
    1.66
    We
    1.61
     we
    1.57
    we
    1.38
     WE
    1.20
     ourselves
    1.16
    Мы
    1.13
     Мы
    1.10
     Weinstein
    1.04
    WE
    1.04
    Act Density 0.189%

    No Known Activations