INDEX
    Explanations

    phrases related to misinformation, deception, and false information

    New Auto-Interp
    Negative Logits
    hens
    -0.75
    anse
    -0.74
    foreseen
    -0.72
    arya
    -0.68
    area
    -0.68
    aldo
    -0.67
    foundation
    -0.67
     guiActiveUnfocused
    -0.67
    illes
    -0.67
    winner
    -0.67
    POSITIVE LOGITS
    ument
    1.05
    ulent
    0.96
    ulence
    0.90
     falsely
    0.88
     excuse
    0.87
     excuses
    0.84
     pretext
    0.82
     concoct
    0.77
     pas
    0.75
     false
    0.74
    Act Density 1.225%

    No Known Activations