INDEX
    Explanations

    instances of the word "We" and its variants, indicating collective statements or intentions

    New Auto-Interp
    Negative Logits
    -0.20
    ._↵
    -0.18
     âĢŀ
    -0.17
     -↵
    -0.16
    .*↵
    -0.16
    --↵
    -0.16
     --↵
    -0.15
    -↵
    -0.15
     '↵
    -0.15
     —↵
    -0.15
    POSITIVE LOGITS
    ir
    0.43
    apons
    0.41
    bsite
    0.41
    ng
    0.38
    gether
    0.37
    ek
    0.37
    ather
    0.36
    ory
    0.35
    ide
    0.32
    thing
    0.32
    Act Density 0.110%

    No Known Activations