INDEX
    Explanations

    instances of negative commentary or criticisms about societal issues, particularly regarding gender and violence

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.10
    3:0.07
    4:0.17
    5:0.02
    6:0.17
    7:0.14
    8:0.03
    9:0.03
    10:0.08
    11:0.09
    Negative Logits
    BuyableInstoreAndOnline
    -1.71
    isSpecialOrderable
    -1.55
    okane
    -1.52
    Reviewed
    -1.47
    enthal
    -1.38
    Spot
    -1.37
    izons
    -1.36
     Parkway
    -1.34
    zeb
    -1.31
    eele
    -1.31
    POSITIVE LOGITS
     nonsense
    1.54
     tricks
    1.49
     syll
    1.47
     backwards
    1.47
     Bastard
    1.44
     notation
    1.40
    ]}
    1.36
    amn
    1.36
    .'"
    1.35
     misc
    1.35
    Act Density 0.001%

    No Known Activations