INDEX
    Explanations

    words related to extreme behavior or actions, as well as aspects related to politeness and manners

    extreme behaviors and their impact

    New Auto-Interp
    Negative Logits
     Started
    -0.75
     Timeline
    -0.67
     Completed
    -0.67
     Previous
    -0.65
     CM
    -0.65
     Prior
    -0.63
     Updated
    -0.62
     Announce
    -0.60
     Nationwide
    -0.58
     Streaming
    -0.58
    POSITIVE LOGITS
     slightest
    0.79
     lest
    0.76
    sometimes
    0.70
     occasionally
    0.69
     occasional
    0.67
     even
    0.67
    anything
    0.66
    azor
    0.64
    ifling
    0.64
    udder
    0.63
    Act Density 0.675%

    No Known Activations