INDEX
    Explanations

    expressions of uncertainty or denial

    New Auto-Interp
    Negative Logits
     were
    -1.13
     are
    -0.85
    were
    -0.84
     weren
    -0.81
     Were
    -0.75
     WERE
    -0.73
     don
    -0.72
     voltak
    -0.71
     aren
    -0.66
    are
    -0.66
    POSITIVE LOGITS
     itſelf
    0.97
     Monfieur
    0.84
     Appears
    0.77
     recognises
    0.76
    appears
    0.75
     penetrates
    0.74
     Serves
    0.73
    ftagPool
    0.73
     Beſ
    0.73
    does
    0.73
    Act Density 0.131%

    No Known Activations