INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -2.95
     myſelf
    -2.88
     ་་
    -2.75
     ―――――
    -2.75
     Efq
    -2.72
    ſelf
    -2.67
    ſelves
    -2.66
     iſt
    -2.52
     Majefty
    -2.52
     ſeveral
    -2.52
    POSITIVE LOGITS
    ,
    1.19
    .
    1.16
     (
    1.13
    !
    1.13
    <eos>
    1.09
    '
    1.09
     [
    1.08
    1.05
    1.01
     k
    1.00
    Act Density 0.057%

    No Known Activations