INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Efq
    -1.20
     ſever
    -1.16
     Diſ
    -1.13
     Reſ
    -1.11
     Monfieur
    -1.09
     faſt
    -1.08
     Eſ
    -1.05
     auffi
    -1.04
     pleaſure
    -1.00
     houſe
    -1.00
    POSITIVE LOGITS
     l
    0.60
     to
    0.59
    0.58
     f
    0.56
     as
    0.56
     of
    0.56
     d
    0.55
     in
    0.54
     an
    0.54
     der
    0.54
    Act Density 0.123%

    No Known Activations