INDEX
    Explanations

    repeated sequences of underscores or similar patterns

    New Auto-Interp
    Negative Logits
     a
    -0.69
     I
    -0.59
     my
    -0.58
     it
    -0.57
     the
    -0.56
     all
    -0.56
     an
    -0.55
     S
    -0.53
     at
    -0.53
     i
    -0.53
    POSITIVE LOGITS
     pleaſure
    1.50
     purpoſe
    1.47
     Monfieur
    1.47
     мәкал
    1.46
     itſelf
    1.42
     Majefty
    1.39
     themſelves
    1.38
     Reſ
    1.37
     raiſ
    1.36
     ſche
    1.33
    Act Density 0.936%

    No Known Activations