INDEX
    Explanations

    statements or references related to claims of truthfulness

    New Auto-Interp
    Negative Logits
     drowned
    -0.15
    983
    -0.14
     Bench
    -0.14
    éļı
    -0.14
    adox
    -0.14
    ournals
    -0.13
    antal
    -0.13
    æ½
    -0.13
     unsuccessful
    -0.13
    iction
    -0.13
    POSITIVE LOGITS
     exposing
    0.20
     expose
    0.18
     exposure
    0.18
     truth
    0.18
     exposures
    0.17
     exposes
    0.17
    Expose
    0.17
     wakeup
    0.17
     readers
    0.16
    Truth
    0.16
    Act Density 0.504%

    No Known Activations