INDEX
    Explanations

    phrases related to social interaction or communication

    instances of words related to friendly interactions or social engagement

    New Auto-Interp
    Negative Logits
     Bell
    -0.66
     acqu
    -0.64
     counter
    -0.64
     envis
    -0.63
     Ferr
    -0.63
     su
    -0.62
     patron
    -0.62
     Cross
    -0.56
     Dem
    -0.55
     cross
    -0.55
    POSITIVE LOGITS
    atted
    4.45
    ats
    1.47
    ":["
    1.23
    ioned
    1.21
    uffed
    1.19
    outed
    1.13
    atter
    1.09
    ouched
    1.07
    ated
    1.06
    ATS
    1.05
    Act Density 0.018%

    No Known Activations