INDEX
    Explanations

    phrases related to truth or accuracy

    statements about truth or verification

    New Auto-Interp
    Negative Logits
    ercise
    -0.80
    isphere
    -0.75
    igan
    -0.74
    reau
    -0.66
    cloth
    -0.65
     Habit
    -0.63
    wal
    -0.62
    boy
    -0.58
    aban
    -0.58
    lease
    -0.56
    POSITIVE LOGITS
     true
    3.66
    true
    2.82
    True
    2.22
     TRUE
    2.15
     True
    2.11
     false
    1.85
    false
    1.77
     truth
    1.60
     untrue
    1.47
    False
    1.44
    Act Density 0.027%

    No Known Activations