INDEX
    Explanations

    references to fake news or misinformation

    New Auto-Interp
    Negative Logits
     ब्रेकडाउन
    -0.40
    postIndex
    -0.37
    MessageTagHelper
    -0.36
     gyhoeddwyd
    -0.34
     identidad
    -0.34
    balleur
    -0.34
    ấp
    -0.33
     wikipagina
    -0.32
    identité
    -0.32
    pegno
    -0.31
    POSITIVE LOGITS
     exaggerate
    0.49
     exaggerated
    0.49
     exagger
    0.48
     exaggeration
    0.48
     exaggerating
    0.48
     inflated
    0.45
     exager
    0.44
     المعيارى
    0.44
    Pad
    0.43
     faulty
    0.43
    Act Density 0.897%

    No Known Activations