INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ta
    -0.73
     Li
    -0.69
     Mar
    -0.69
     di
    -0.69
     pos
    -0.69
     Ben
    -0.68
     var
    -0.68
     Gor
    -0.68
     J
    -0.68
     si
    -0.68
    POSITIVE LOGITS
     itſelf
    1.63
     myſelf
    1.52
     Monfieur
    1.45
     Majefty
    1.45
     Efq
    1.41
     houſe
    1.38
     Houſe
    1.38
     themſelves
    1.34
     pleaſure
    1.32
     himſelf
    1.30
    Act Density 0.088%

    No Known Activations