INDEX
    Explanations

    phrases associated with deception and misinformation

    New Auto-Interp
    Negative Logits
    imanapun
    -0.70
     vieles
    -0.56
    TAGS
    -0.54
    chmal
    -0.50
    vertret
    -0.49
     '@/
    -0.49
    bitField
    -0.48
     trovo
    -0.47
    htë
    -0.47
     سياس
    -0.47
    POSITIVE LOGITS
     us
    0.90
     everyone
    0.84
     readers
    0.83
     customers
    0.80
     audiences
    0.79
     everybody
    0.75
     you
    0.74
     consumers
    0.74
    everyone
    0.73
     people
    0.73
    Act Density 0.622%

    No Known Activations