INDEX
    Explanations

    phrases related to criticism or disapproval

    instances of criticism or accountability in discussions

    New Auto-Interp
    Negative Logits
    arnaev
    -0.72
    nel
    -0.72
     CrossRef
    -0.70
     Starship
    -0.69
     Sutherland
    -0.69
    aea
    -0.66
     Chung
    -0.66
    nosis
    -0.65
    usa
    -0.65
    Translation
    -0.64
    POSITIVE LOGITS
     superiority
    0.98
     blasphemy
    0.83
     failures
    0.83
     unfair
    0.81
     injust
    0.80
     daring
    0.80
     frivolous
    0.79
     piety
    0.77
     accomplishments
    0.76
     inaction
    0.76
    Act Density 0.606%

    No Known Activations