INDEX
    Explanations

    negations, particularly the word "isn't."

    New Auto-Interp
    Negative Logits
    Slf
    -0.72
    does
    -0.70
     -
    -0.65
     does
    -0.65
    did
    -0.64
    i
    -0.61
     EdgeInsets
    -0.61
     DES
    -0.60
     dos
    -0.60
     \
    -0.60
    POSITIVE LOGITS
     raiſ
    1.10
     Anſ
    0.99
     itſelf
    0.96
     ſever
    0.95
    ...');
    0.93
     ſind
    0.92
     Eſ
    0.92
     iſt
    0.87
     faſt
    0.87
     myſelf
    0.86
    Act Density 0.032%

    No Known Activations