INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     habet
    -0.82
    ſelf
    -0.77
     againſt
    -0.73
     myſelf
    -0.72
     ſhe
    -0.71
     becauſe
    -0.70
     itſelf
    -0.67
     Legendre
    -0.67
     ſome
    -0.66
     themſelves
    -0.66
    POSITIVE LOGITS
    ian
    0.58
    ier
    0.54
    पया
    0.52
    epar
    0.50
    ize
    0.50
    ious
    0.49
    ians
    0.49
    BARA
    0.48
    Enllaces
    0.47
    ized
    0.47
    Act Density 0.087%

    No Known Activations