INDEX
    Explanations

    words related to misinformation and deception

    phrases related to falsehoods or misinformation

    New Auto-Interp
    Negative Logits
    hens
    -1.03
     guiActiveUnfocused
    -0.92
    hetti
    -0.88
    ajo
    -0.79
    mun
    -0.77
    aldo
    -0.76
    ODY
    -0.75
    xual
    -0.75
    APTER
    -0.73
    rador
    -0.72
    POSITIVE LOGITS
     positives
    1.02
     accuser
    0.85
     guiActiveUn
    0.84
     false
    0.82
     dich
    0.81
     falsely
    0.76
     unfocusedRange
    0.75
     guiIcon
    0.73
     negatives
    0.72
     assumptions
    0.72
    Act Density 0.019%

    No Known Activations