INDEX
    Explanations

    phrases related to concerns, issues, or disruptions

    negative sentiments or issues related to safety and disruptions

    New Auto-Interp
    Negative Logits
    liam
    -0.71
    eele
    -0.66
    ortment
    -0.63
    igham
    -0.60
    fw
    -0.59
    acha
    -0.58
    itled
    -0.57
    ku
    -0.57
    tsky
    -0.56
    ethyl
    -0.56
    POSITIVE LOGITS
     whatsoever
    1.72
     nor
    1.57
     anymore
    1.46
    nor
    1.01
     anything
    1.00
     slightest
    0.98
     anybody
    0.95
     either
    0.95
     anywhere
    0.93
     except
    0.87
    Act Density 0.264%

    No Known Activations