INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     and
    -0.63
    <bos>
    -0.55
     '
    -0.54
    :✨
    -0.53
     $
    -0.52
     so
    -0.52
    -0.50
     ​
    -0.50
    '
    -0.50
     non
    -0.49
    POSITIVE LOGITS
     Efq
    1.31
     myſelf
    1.21
     Monfieur
    1.20
     ainfi
    1.16
     themſelves
    1.15
     Jefus
    1.15
     Reſ
    1.15
     ſtate
    1.11
     pleaſure
    1.11
     leſs
    1.09
    Act Density 0.063%

    No Known Activations