INDEX
    Explanations

    claims related to political discourse and their credibility

    New Auto-Interp
    Negative Logits
     noDo
    -0.51
    oplayer
    -0.43
     imitating
    -0.36
     חוש
    -0.35
     Photocase
    -0.35
    Πηγές
    -0.35
    Anhalt
    -0.34
     inconspicuous
    -0.34
    tomation
    -0.34
    ilingual
    -0.34
    POSITIVE LOGITS
     fiction
    0.79
     fabrication
    0.71
     unsub
    0.71
     fantasy
    0.71
     fanciful
    0.70
     fic
    0.66
     unsupported
    0.65
     base
    0.65
     fabricated
    0.65
    fiction
    0.63
    Act Density 0.920%

    No Known Activations