INDEX
    Explanations

    phrases related to the negative impact of certain policies and actions

    New Auto-Interp
    Negative Logits
    oppers
    -0.15
    ÅĻez
    -0.14
    vb
    -0.14
     Demir
    -0.14
    928
    -0.14
    éīĦ
    -0.14
    odom
    -0.13
    ама
    -0.13
    Ư
    -0.13
    oriously
    -0.13
    POSITIVE LOGITS
     increase
    0.19
     only
    0.19
     instead
    0.19
     pand
    0.17
    increase
    0.17
     вмеÑģÑĤ
    0.17
     Increase
    0.16
     Div
    0.16
     Only
    0.16
     increased
    0.16
    Act Density 0.321%

    No Known Activations