INDEX
    Explanations

    references to falsehoods, lies, and misleading statements in the context of honesty and integrity

    New Auto-Interp
    Negative Logits
    BorderColor
    -0.50
    onha
    -0.49
    жкой
    -0.48
    ngths
    -0.46
    affinity
    -0.46
     uygun
    -0.46
    нибудь
    -0.46
    ırken
    -0.43
    اعمال
    -0.43
     prefer
    -0.43
    POSITIVE LOGITS
     falsehood
    1.07
     liar
    1.03
     perjury
    0.94
     lied
    0.93
     liars
    0.92
    Lies
    0.92
     Lies
    0.88
     untrue
    0.88
     lies
    0.88
     false
    0.88
    Act Density 0.359%

    No Known Activations