INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Take
    -0.85
     (
    -0.75
    ,
    -0.73
     It
    -0.65
     A
    -0.64
     [
    -0.61
     No
    -0.61
     In
    -0.60
     Ter
    -0.60
     for
    -0.59
    POSITIVE LOGITS
     Eſ
    1.27
     Reſ
    1.23
     ſeveral
    1.20
     Conſ
    1.19
     Diſ
    1.17
    ſelf
    1.16
     Efq
    1.16
     iſt
    1.14
     Monfieur
    1.14
    ſelves
    1.13
    Act Density 0.077%

    No Known Activations