INDEX
    Explanations

    controversial or negative associations and actions related to various groups or individuals

    discussions of harmful societal issues and groups

    New Auto-Interp
    Negative Logits
    ERG
    -0.66
     Ank
    -0.63
    Sym
    -0.62
    STRUCT
    -0.61
    OIL
    -0.60
    Sum
    -0.59
    Vert
    -0.57
    Shift
    -0.57
    Sund
    -0.57
    Var
    -0.56
    POSITIVE LOGITS
     respectively
    0.87
     etc
    0.72
    isine
    0.71
    atics
    0.70
    .''.
    0.69
    .",
    0.67
    .[
    0.67
     backgrounds
    0.66
     perpetrated
    0.64
    ¥µ
    0.64
    Act Density 0.629%

    No Known Activations