INDEX
    Explanations

    references to political figures and their statements concerning discrimination and freedom of speech

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.01
    2:0.06
    3:0.08
    4:0.04
    5:0.09
    6:0.07
    7:0.06
    8:0.37
    9:0.04
    10:0.03
    11:0.03
    Negative Logits
     Quincy
    -3.31
     Lowell
    -3.15
     Syracuse
    -3.10
    acci
    -3.00
    arnaev
    -2.98
     Sweeney
    -2.97
     Erie
    -2.93
    COMPLE
    -2.88
     Corvette
    -2.86
     Rhode
    -2.85
    POSITIVE LOGITS
    Dutch
    4.85
     Netherlands
    4.60
     Dutch
    4.42
     ko
    3.91
     Ajax
    3.85
     PV
    3.70
     Johannes
    3.67
     Farage
    3.60
     Danish
    3.60
     Denmark
    3.58
    Act Density 0.003%

    No Known Activations