INDEX
    Explanations

    names of individuals associated with controversial topics or issues

    New Auto-Interp
    Negative Logits
     iP
    -0.86
     RELE
    -0.84
     IMP
    -0.80
     INTER
    -0.77
     acron
    -0.77
     ASC
    -0.76
     supers
    -0.74
     DIRECT
    -0.71
     SAN
    -0.71
     CONTR
    -0.69
    POSITIVE LOGITS
    inar
    1.12
    idy
    1.10
    aga
    1.09
    axis
    1.05
    ady
    1.00
    ipal
    1.00
    ucket
    1.00
    arie
    0.98
    ilus
    0.97
    actor
    0.97
    Act Density 2.075%

    No Known Activations