INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     my
    -0.47
     me
    -0.46
     an
    -0.43
     his
    -0.42
     a
    -0.42
     red
    -0.41
     meny
    -0.41
    ]}$
    -0.40
    )))
    
    -0.40
    ]))
    
    -0.39
    POSITIVE LOGITS
     Houſe
    1.37
     Majefty
    1.37
     Reſ
    1.29
     Anſ
    1.23
     Efq
    1.22
    ſelf
    1.20
     Jefus
    1.20
     itſelf
    1.20
     Diſ
    1.19
     Conſ
    1.13
    Act Density 1.393%

    No Known Activations