INDEX
    Explanations

    words related to social interactions and engagement

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.01
    2:0.09
    3:0.07
    4:0.17
    5:0.03
    6:0.08
    7:0.31
    8:0.03
    9:0.03
    10:0.06
    11:0.04
    Negative Logits
    agos
    -2.09
    rompt
    -1.91
    ascript
    -1.89
    ebted
    -1.81
    iseum
    -1.79
    uliffe
    -1.74
    ancial
    -1.73
    umar
    -1.70
    entanyl
    -1.67
    aunder
    -1.61
    POSITIVE LOGITS
     horizont
    1.82
     thick
    1.58
     circles
    1.58
     Lego
    1.55
     pics
    1.54
    pole
    1.51
     hither
    1.50
     LINE
    1.50
     thicker
    1.49
     Moe
    1.47
    Act Density 0.001%

    No Known Activations