INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -2.66
     Efq
    -2.59
     myſelf
    -2.44
     Monfieur
    -2.34
     Jefus
    -2.33
     Majefty
    -2.31
     raiſ
    -2.25
     Reſ
    -2.19
     ſeveral
    -2.17
     Theſe
    -2.17
    POSITIVE LOGITS
    1.43
     I
    1.11
     C
    1.04
     D
    0.99
     B
    0.98
     O
    0.98
     P
    0.98
    ,
    0.97
     (
    0.96
     M
    0.96
    Act Density 0.339%

    No Known Activations