INDEX
    Explanations

    directives or rules

    expressions related to rules and guidelines in communication, particularly in social media contexts

    New Auto-Interp
    Negative Logits
     convergence
    -0.72
     refurb
    -0.68
     pioneering
    -0.64
     upgr
    -0.62
     staggered
    -0.61
     estimated
    -0.61
    trak
    -0.60
     unparalleled
    -0.60
     millenn
    -0.60
     Baz
    -0.59
    POSITIVE LOGITS
     anymore
    1.07
     inappropriately
    0.97
     nor
    0.93
     unnecessarily
    0.85
    iquette
    0.85
     disrespectful
    0.84
    .?
    0.83
     whatsoever
    0.81
     unless
    0.80
     slurs
    0.79
    Act Density 0.915%

    No Known Activations