INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     of
    -0.77
    -
    -0.60
     M
    -0.55
    .
    -0.54
    -0.52
     B
    -0.52
     action
    -0.50
     Mat
    -0.50
     …
    -0.50
    M
    -0.50
    POSITIVE LOGITS
     Monfieur
    1.18
     myſelf
    1.15
     houſe
    1.05
     itſelf
    1.05
     pleaſure
    1.03
     ſever
    1.02
     Efq
    1.02
     Anſ
    0.97
     himſelf
    0.96
     Theſe
    0.96
    Act Density 1.652%

    No Known Activations