INDEX
    Explanations

    statements related to truthfulness

    occurrences of the word "truth."

    New Auto-Interp
    Negative Logits
    uled
    -0.86
    wana
    -0.77
    capacity
    -0.75
    avy
    -0.73
     Rivals
    -0.70
    jin
    -0.67
    urations
    -0.67
    joining
    -0.66
    ATA
    -0.66
    oyal
    -0.65
    POSITIVE LOGITS
    fulness
    1.24
    fully
    1.06
    psons
    0.92
    lessly
    0.80
     seeker
    0.79
     truth
    0.78
    ulent
    0.78
     serum
    0.76
    iness
    0.76
    telling
    0.75
    Act Density 0.014%

    No Known Activations