INDEX
    Explanations

    expressions related to truth-telling and the pursuit of knowledge

    New Auto-Interp
    Negative Logits
    adic
    -0.17
    ocop
    -0.15
    illard
    -0.15
     Stam
    -0.14
    oogle
    -0.14
    å½
    -0.14
    neider
    -0.13
    221
    -0.13
    nul
    -0.13
    ocab
    -0.13
    POSITIVE LOGITS
     truth
    1.05
    truth
    0.93
     Truth
    0.88
    Truth
    0.82
     truths
    0.77
     verdad
    0.71
    _truth
    0.65
     truthful
    0.60
    .truth
    0.55
     true
    0.46
    Act Density 0.197%

    No Known Activations