INDEX
    Explanations

    terms related to tolerance policies, especially in the context of governance and behavior regulation

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.01
    2:0.10
    3:0.07
    4:0.10
    5:0.03
    6:0.05
    7:0.36
    8:0.03
    9:0.04
    10:0.09
    11:0.05
    Negative Logits
    estamp
    -1.92
    window
    -1.70
     prototypes
    -1.54
    alter
    -1.53
    hook
    -1.51
     Ago
    -1.51
    -1.49
    ements
    -1.48
    prints
    -1.47
    -1.46
    POSITIVE LOGITS
     cruelty
    1.79
     Violence
    1.72
     harassment
    1.71
     abuse
    1.69
     homophobia
    1.66
     manslaughter
    1.66
     dealing
    1.65
     criminally
    1.64
     racism
    1.63
     discrimination
    1.63
    Act Density 0.000%

    No Known Activations