INDEX
    Explanations

    terms and phrases related to fake news and deception

    New Auto-Interp
    Negative Logits
     tarifa
    -0.69
    حوالہ
    -0.69
    اریخ
    -0.68
     volando
    -0.68
     chacun
    -0.65
     parlant
    -0.65
    timbangkan
    -0.64
     pribadi
    -0.64
     DriverManager
    -0.64
     debout
    -0.64
    POSITIVE LOGITS
     fake
    1.39
     Fake
    1.28
     false
    1.23
    fake
    1.22
    Fake
    1.18
     False
    1.15
     fakes
    1.13
    False
    1.13
    false
    1.10
     hoax
    1.06
    Act Density 0.280%

    No Known Activations