INDEX
    Explanations

    words related to a feeling of moral or factual incorrectness

    statements indicating something is considered incorrect or morally wrong

    New Auto-Interp
    Negative Logits
    rien
    -0.66
    ¯¯¯¯
    -0.65
    ribes
    -0.63
    cit
    -0.63
    anned
    -0.62
    apple
    -0.62
    glas
    -0.62
    usters
    -0.61
    ility
    -0.61
    é¾
    -0.60
    POSITIVE LOGITS
     wrong
    1.02
     unfocusedRange
    0.89
    wrong
    0.86
     culprit
    0.80
    fully
    0.77
    ibrary
    0.75
     mistaken
    0.74
     Wrong
    0.74
     tack
    0.73
    eous
    0.70
    Act Density 0.014%

    No Known Activations