INDEX
    Explanations

    statements and claims regarding truthfulness and accuracy in various contexts

    New Auto-Interp
    Negative Logits
     clearfix
    -0.15
    chie
    -0.15
     Fraud
    -0.15
    ocom
    -0.14
     klar
    -0.14
    obot
    -0.14
    icas
    -0.14
     Remed
    -0.14
     Cheat
    -0.13
    onet
    -0.13
    POSITIVE LOGITS
     accurate
    0.41
     accuracy
    0.39
     correct
    0.39
     true
    0.35
     accur
    0.34
     Accuracy
    0.34
    accur
    0.31
    accuracy
    0.31
    true
    0.31
     truth
    0.30
    Act Density 0.187%

    No Known Activations