INDEX
    Explanations

    negative contractions and phrases expressing doubt or uncertainty

    New Auto-Interp
    Negative Logits
    <bos>
    -1.56
    leſs
    -0.98
     againſt
    -0.98
    ſelf
    -0.98
    ſelves
    -0.93
     iſt
    -0.92
     doubtnut
    -0.92
     Anſ
    -0.91
    rungsseite
    -0.91
     leſs
    -0.91
    POSITIVE LOGITS
     didn
    0.85
     it
    0.78
     doesn
    0.73
     wasn
    0.73
     It
    0.71
     shouldn
    0.69
     I
    0.69
     don
    0.69
     a
    0.67
     wouldn
    0.66
    Act Density 0.063%

    No Known Activations