INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ac
    -1.12
    ac
    -1.05
     Ac
    -1.02
     AC
    -0.83
     Art
    -0.80
    Ac
    -0.78
     (
    -0.67
     acc
    -0.67
     original
    -0.66
    Art
    -0.66
    POSITIVE LOGITS
     Houſe
    1.59
     Theſe
    1.52
     Diſ
    1.46
     Jefus
    1.45
     Efq
    1.45
     Majefty
    1.42
     myſelf
    1.41
     Anſ
    1.38
     himſelf
    1.37
     Reſ
    1.35
    Act Density 0.368%

    No Known Activations