INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    alah
    -0.97
     myſelf
    -0.71
     themſelves
    -0.68
     itſelf
    -0.67
     Shakspeare
    -0.65
     Fasc
    -0.59
     hefyd
    -0.59
    felves
    -0.58
     auffi
    -0.58
     himſelf
    -0.57
    POSITIVE LOGITS
     defStyle
    0.52
    CrossRef
    0.52
     trajets
    0.47
    MarshalTo
    0.47
    εια
    0.46
    iprot
    0.46
    TUAL
    0.46
     pad
    0.46
     nahilalakip
    0.46
    mpf
    0.45
    Act Density 0.009%

    No Known Activations