INDEX
    Explanations

    words that denote skepticism, questioning, or criticism

    New Auto-Interp
    Negative Logits
    oother
    -0.78
    ilege
    -0.76
    rio
    -0.73
    obook
    -0.72
    hner
    -0.71
    ynthesis
    -0.69
    endment
    -0.69
    psey
    -0.68
    umbn
    -0.68
    othal
    -0.67
    POSITIVE LOGITS
     enough
    0.95
     ones
    0.80
     amounts
    0.79
     huh
    0.76
     alike
    0.76
     sounding
    0.71
     indeed
    0.71
     nonetheless
    0.71
     since
    0.70
     strokes
    0.70
    Act Density 0.192%

    No Known Activations