INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    /
    -2.06
    ,
    -1.31
    .
    -1.16
     (
    -1.06
     or
    -0.99
    -
    -0.94
     “
    -0.91
    (
    -0.85
     a
    -0.82
     and
    -0.82
    POSITIVE LOGITS
    <bos>
    2.14
     myſelf
    1.80
     Theſe
    1.79
     itſelf
    1.69
     Monfieur
    1.65
     auffi
    1.55
     ainfi
    1.53
    ſelf
    1.51
     Reſ
    1.48
     themſelves
    1.47
    Act Density 0.460%

    No Known Activations