INDEX
    Explanations

    statements expressing extreme prejudice or discrimination

    New Auto-Interp
    Negative Logits
     reluct
    -2.16
     encomp
    -2.16
     increa
    -2.13
     guarante
    -2.06
     fuf
    -2.02
     volunte
    -2.01
     inev
    -2.00
     embra
    -1.99
     depic
    -1.97
     emphat
    -1.90
    POSITIVE LOGITS
     etc
    0.99
    0.96
    ...
    0.95
    .
    0.88
    !
    0.86
    ;
    0.85
    ,
    0.85
    ....
    0.84
     too
    0.84
    ?
    0.82
    Act Density 0.391%

    No Known Activations