INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -1.45
     itſelf
    -1.37
     Reſ
    -1.36
     faſt
    -1.32
     Theſe
    -1.32
     Diſ
    -1.30
     Efq
    -1.30
     houſe
    -1.28
     Houſe
    -1.28
     Monfieur
    -1.28
    POSITIVE LOGITS
     I
    1.16
     the
    0.81
     we
    0.80
    en
    0.77
     The
    0.76
     in
    0.75
     for
    0.74
     a
    0.72
     as
    0.72
    .
    0.72
    Act Density 0.035%

    No Known Activations