INDEX
    Explanations

    phrases related to manipulation and deception

    New Auto-Interp
    Negative Logits
     the
    -0.39
    WebVitals
    -0.38
    usercontent
    -0.33
     dry
    -0.31
     Rising
    -0.31
     elkaar
    -0.31
    zeichnen
    -0.31
     onderhoud
    -0.30
     հղումներ
    -0.30
     native
    -0.30
    POSITIVE LOGITS
     betweenstory
    0.65
    ValueStyle
    0.61
     surla
    0.59
    AndEndTag
    0.59
     terpaksa
    0.58
     Forced
    0.55
     tricked
    0.55
    Forced
    0.55
     coer
    0.54
    forced
    0.52
    Act Density 0.019%

    No Known Activations