INDEX
    Explanations

    concepts related to societal problems and moral issues

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.02
    2:0.04
    3:0.10
    4:0.04
    5:0.13
    6:0.02
    7:0.03
    8:0.07
    9:0.21
    10:0.17
    11:0.07
    Negative Logits
    assad
    -1.14
    inav
    -1.00
    trak
    -0.97
    uania
    -0.95
    luaj
    -0.94
    yip
    -0.88
    idated
    -0.86
     Tus
    -0.85
    orest
    -0.85
    BuyableInstoreAndOnline
    -0.84
    POSITIVE LOGITS
     spoiler
    1.22
     paraph
    1.04
     Spoiler
    1.02
     spoilers
    1.00
     commenter
    0.99
     Gawker
    0.96
     commenters
    0.96
    oiler
    0.95
     Hume
    0.94
     rebutt
    0.91
    Act Density 3.560%

    No Known Activations