INDEX
    Explanations

    instances of the word "wrong" and related expressions indicating mistakes or moral failings

    New Auto-Interp
    Negative Logits
    .qual
    -0.16
    اÙĨÙĪ
    -0.15
    anki
    -0.15
    lesi
    -0.15
    arters
    -0.15
    lsa
    -0.15
    illet
    -0.15
    /cli
    -0.15
    apa
    -0.14
    cla
    -0.14
    POSITIVE LOGITS
    headed
    0.43
    fully
    0.40
    -headed
    0.37
    /right
    0.30
    er
    0.30
    ed
    0.30
     wrong
    0.29
    wrong
    0.28
    eous
    0.27
    est
    0.27
    Act Density 0.066%

    No Known Activations