INDEX
    Explanations

    phrases related to negative or harmful actions or behaviors, especially involving violence or illegal activities

    phrases related to abusive or predatory behavior

    New Auto-Interp
    Negative Logits
     univers
    -0.76
    uitive
    -0.72
    ahime
    -0.71
     Revision
    -0.68
    usterity
    -0.68
    icult
    -0.67
    rastructure
    -0.67
     Effective
    -0.65
    ircraft
    -0.65
    ãĤ¤
    -0.64
    POSITIVE LOGITS
     stole
    1.09
     proceeded
    1.07
     drank
    1.05
     subsequently
    1.05
     ate
    1.03
     secondly
    1.03
     assaulted
    1.03
     raped
    1.02
     overheard
    1.02
     interfered
    1.02
    Act Density 0.345%

    No Known Activations