INDEX
    Explanations

    words related to honesty and sincerity

    New Auto-Interp
    Negative Logits
     desir
    -1.27
     increa
    -1.25
     disagre
    -1.22
     thut
    -1.22
     affor
    -1.22
     reluct
    -1.21
     accla
    -1.21
     ?...
    -1.18
     emphat
    -1.17
     effe
    -1.17
    POSITIVE LOGITS
     honest
    1.17
     honesty
    1.05
     Honest
    0.97
    honest
    0.94
    Honest
    0.92
     honestly
    0.70
    <bos>
    0.66
     truth
    0.65
     Honesty
    0.58
     ehrlich
    0.57
    Act Density 0.052%

    No Known Activations