INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -1.50
     myſelf
    -1.38
     Monfieur
    -1.34
     Theſe
    -1.27
     Jefus
    -1.27
     Efq
    -1.27
     pleaſure
    -1.26
    ſelf
    -1.22
    ſelves
    -1.22
     Anſ
    -1.22
    POSITIVE LOGITS
     and
    0.77
     of
    0.71
     in
    0.70
    0.65
    ,
    0.65
     to
    0.63
     with
    0.63
    .
    0.63
     by
    0.62
     from
    0.61
    Act Density 0.070%

    No Known Activations