INDEX
    Explanations

    statements emphasizing the importance of truth

    references to the concept of "truth."

    New Auto-Interp
    Negative Logits
    uled
    -0.86
    wana
    -0.81
    alian
    -0.75
    avy
    -0.71
    avia
    -0.69
    akings
    -0.69
    joining
    -0.68
    unes
    -0.67
    orks
    -0.66
    urations
    -0.65
    POSITIVE LOGITS
    fulness
    1.20
    fully
    1.01
     truth
    0.92
    truth
    0.84
     Truth
    0.82
    lessly
    0.81
     srfAttach
    0.79
     seeker
    0.79
    psons
    0.79
    iness
    0.78
    Act Density 0.017%

    No Known Activations