INDEX
    Explanations

    negations and expressions of disagreement

    New Auto-Interp
    Negative Logits
     myſelf
    -1.05
     Efq
    -0.89
     uſed
    -0.88
     itſelf
    -0.87
     raiſ
    -0.87
     ſeveral
    -0.86
     Tacitus
    -0.81
     Manchuria
    -0.81
    Portály
    -0.80
     purpoſe
    -0.79
    POSITIVE LOGITS
     is
    0.99
     not
    0.91
     Not
    0.85
     WAS
    0.79
    我不是
    0.78
    not
    0.76
     isn
    0.73
    不是
    0.73
    IsNot
    0.71
     being
    0.70
    Act Density 0.114%

    No Known Activations