INDEX
    Explanations

    discussions or phrases related to criticism or backlash

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.01
    2:0.10
    3:0.06
    4:0.10
    5:0.03
    6:0.03
    7:0.38
    8:0.04
    9:0.04
    10:0.10
    11:0.05
    Negative Logits
    irlf
    -1.90
    redo
    -1.78
    arnaev
    -1.73
    inth
    -1.70
     Grave
    -1.65
    raits
    -1.64
    ixt
    -1.64
    iece
    -1.64
    otted
    -1.59
    eros
    -1.58
    POSITIVE LOGITS
     endorsing
    2.18
     antiv
    2.14
     disapproval
    2.11
     advis
    2.10
     endorsement
    2.10
     censorship
    2.08
     behavi
    2.04
     approving
    1.97
     omission
    1.95
     endorsements
    1.93
    Act Density 0.000%

    No Known Activations