INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     "
    -0.78
    </h3>
    -0.77
     her
    -0.68
     da
    -0.68
     van
    -0.67
    </h6>
    -0.67
     t
    -0.67
     «
    -0.67
     et
    -0.66
     im
    -0.64
    POSITIVE LOGITS
     quæ
    1.17
     Reſ
    1.13
     ſever
    1.13
     Perſ
    1.12
     ſtate
    1.12
     Anſ
    1.10
     reaſon
    1.10
     itſelf
    1.10
     myſelf
    1.09
     Diſ
    1.09
    Act Density 0.137%

    No Known Activations