INDEX
    Explanations

    references to social or political commentary

    New Auto-Interp
    Head Attr Weights
    0:0.12
    1:0.45
    2:0.03
    3:0.03
    4:0.02
    5:0.12
    6:0.02
    7:0.01
    8:0.03
    9:0.06
    10:0.03
    11:0.03
    Negative Logits
    SE
    -1.96
    asc
    -1.93
    -1.85
     Seg
    -1.82
    -1.81
    spons
    -1.78
    SA
    -1.73
     Sailor
    -1.72
    -1.70
    -1.60
    POSITIVE LOGITS
    theless
    2.17
    bage
    1.82
    bish
    1.68
     but
    1.65
    etheless
    1.60
    gered
    1.58
    warts
    1.56
    ebin
    1.56
    ulous
    1.53
    but
    1.50
    Act Density 0.016%

    No Known Activations