INDEX
    Explanations

    negation and dismissive attitudes toward certain racial discussions

    negation and contractions

    New Auto-Interp
    Negative Logits
     autorytatywna
    -0.59
    quelize
    -0.58
    oa̍t
    -0.54
     Wicidata
    -0.53
    rrggbb
    -0.51
    tagHelperRunner
    -0.50
     فريبيس
    -0.49
    ftagPool
    -0.47
    تقاوى
    -0.46
    XmlAccessorType
    -0.46
    POSITIVE LOGITS
     toxic
    0.41
    SBATCH
    0.40
    AutoField
    0.37
    screen
    0.36
     Auto
    0.35
     angry
    0.35
    fbc
    0.35
    DropColumn
    0.35
     blot
    0.35
    Axis
    0.35
    Act Density 0.076%

    No Known Activations