INDEX
    Explanations

    references related to social issues, public statements, and inappropriate behavior

    New Auto-Interp
    Negative Logits
     impractica
    -1.33
     impra
    -1.25
     unwarran
    -1.22
     uninten
    -1.18
     increa
    -1.18
     thut
    -1.17
     fta
    -1.17
     disagre
    -1.15
     reluct
    -1.14
     ecru
    -1.13
    POSITIVE LOGITS
     unacceptable
    0.60
     tolerance
    0.55
     acts
    0.55
     violence
    0.54
     behavior
    0.54
     zero
    0.52
    ZERO
    0.51
     anyone
    0.51
    Zero
    0.50
     behaviors
    0.50
    Act Density 0.417%

    No Known Activations