INDEX
    Explanations

    instances of inaccuracies or falsehoods in a text

    terms related to deceptive or incorrect information and its consequences

    New Auto-Interp
    Negative Logits
    negie
    -0.77
    oubted
    -0.74
     GOODMAN
    -0.71
    ICA
    -0.71
    rolet
    -0.70
    verning
    -0.70
    ampions
    -0.68
    ificantly
    -0.68
    atorium
    -0.67
    interrupted
    -0.66
    POSITIVE LOGITS
     syndrome
    1.01
    glers
    0.92
     Syndrome
    0.85
     perpetrated
    0.80
    manship
    0.73
     imaginable
    0.69
    Exception
    0.68
    ulence
    0.67
     mas
    0.67
     practices
    0.66
    Act Density 0.369%

    No Known Activations