INDEX
    Explanations

    specific tokens that repeat in various contexts

    New Auto-Interp
    Negative Logits
     myſelf
    -1.41
     itſelf
    -1.35
     Theſe
    -1.30
     purpoſe
    -1.27
     ſeveral
    -1.26
     faſt
    -1.25
     perſon
    -1.22
     ſever
    -1.22
     Monfieur
    -1.21
    ſelf
    -1.21
    POSITIVE LOGITS
     h
    1.78
     r
    1.73
     b
    1.70
     p
    1.67
     m
    1.66
     c
    1.64
     d
    1.61
     s
    1.59
     k
    1.58
     f
    1.58
    Act Density 0.405%

    No Known Activations