INDEX
    Explanations

    words related to social interactions or behaviors

    terms related to social behavior

    New Auto-Interp
    Negative Logits
    nces
    -0.84
    atche
    -0.79
    xual
    -0.75
    ilts
    -0.74
    gran
    -0.72
    1001
    -0.71
    ras
    -0.70
    gger
    -0.70
    shall
    -0.70
    oning
    -0.69
    POSITIVE LOGITS
     norms
    0.96
     interaction
    0.95
     interactions
    0.95
     cues
    0.89
    ized
    0.84
     relations
    0.82
     gatherings
    0.81
    izing
    0.78
    istic
    0.77
     affili
    0.77
    Act Density 0.025%

    No Known Activations