INDEX
    Explanations

    specific articles like "a," "an," and "the."

    New Auto-Interp
    Negative Logits
     myſelf
    -1.39
     pleaſure
    -1.37
     himſelf
    -1.35
     purpoſe
    -1.34
     itſelf
    -1.29
     Jefus
    -1.29
     Theſe
    -1.29
     Monfieur
    -1.28
     faſt
    -1.27
     iſt
    -1.23
    POSITIVE LOGITS
    0.65
     in
    0.63
    .
    0.56
    ,
    0.55
     (
    0.52
     to
    0.51
     as
    0.51
    0.50
     with
    0.49
     [
    0.47
    Act Density 0.068%

    No Known Activations