INDEX
    Explanations

    occurrences of the word "we" indicating a collective perspective or action

    New Auto-Interp
    Negative Logits
     itself
    -0.22
    was
    -0.19
    ly
    -0.17
    (s
    -0.15
    aug
    -0.15
    st
    -0.14
    ctor
    -0.14
    ocate
    -0.14
    g
    -0.13
    dez
    -0.13
    POSITIVE LOGITS
     ourselves
    0.41
    ’re
    0.39
    're
    0.36
    've
    0.34
    ’ve
    0.32
     are
    0.31
    eping
    0.28
    Ñħодим
    0.28
    'll
    0.28
    ’ll
    0.27
    Act Density 0.297%

    No Known Activations