INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ar
    -0.90
     p
    -0.89
    e
    -0.88
     Ar
    -0.84
     r
    -0.72
    et
    -0.71
    -0.69
    es
    -0.67
    ey
    -0.67
     Ph
    -0.67
    POSITIVE LOGITS
     itſelf
    1.45
     myſelf
    1.45
    1.30
     ſche
    1.28
     Jefus
    1.20
     houſe
    1.20
     purpoſe
    1.20
     pleaſure
    1.18
     Efq
    1.17
     auffi
    1.16
    Act Density 0.804%

    No Known Activations