INDEX
    Explanations

    interactions on social media platforms

    New Auto-Interp
    Head Attr Weights
    0:0.13
    1:0.05
    2:0.06
    3:0.08
    4:0.06
    5:0.09
    6:0.09
    7:0.05
    8:0.08
    9:0.11
    10:0.10
    11:0.04
    Negative Logits
     pall
    -1.03
     shred
    -0.98
     understandable
    -0.98
     vit
    -0.97
    into
    -0.97
    gd
    -0.95
     foremost
    -0.89
    lon
    -0.88
     trumpet
    -0.88
     Loren
    -0.88
    POSITIVE LOGITS
    yip
    1.57
    Interstitial
    1.38
    ebin
    1.26
    ombies
    1.26
     sidx
    1.16
    LESS
    1.16
     bookmark
    1.13
     Pastebin
    1.13
    cycles
    1.12
     Cancel
    1.05
    Act Density 0.005%

    No Known Activations