INDEX
    Explanations

    mentions of physical removal or alteration in a social or political context

    New Auto-Interp
    Negative Logits
     impractica
    -0.89
     reluct
    -0.86
     disagre
    -0.85
     affor
    -0.84
     impra
    -0.82
     perfet
    -0.81
     excru
    -0.81
     scrat
    -0.79
     uninten
    -0.79
     Wtf
    -0.78
    POSITIVE LOGITS
     removal
    1.06
     remove
    1.04
     removed
    1.02
    remove
    1.02
     removes
    1.00
    Remove
    0.98
     Remove
    0.97
     Removal
    0.94
    removed
    0.94
     removing
    0.90
    Act Density 0.148%

    No Known Activations