INDEX
    Explanations

    phrases related to allegations or accusations of wrongdoing

    references to claims or statements of wrongdoing or misconduct

    New Auto-Interp
    Negative Logits
    ger
    -0.68
    xual
    -0.67
    ilation
    -0.66
    atu
    -0.66
    bern
    -0.65
    ament
    -0.65
    liv
    -0.65
    ature
    -0.64
    focus
    -0.63
    heses
    -0.63
    POSITIVE LOGITS
     violated
    0.76
     misrepresent
    0.75
     allegedly
    0.73
     metic
    0.73
     infringing
    0.72
     contradict
    0.72
    Buyable
    0.71
    æ©
    0.71
     originated
    0.71
     infring
    0.71
    Act Density 0.006%

    No Known Activations