INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ſever
    -1.02
     myſelf
    -0.98
     Jefus
    -0.97
     pleaſure
    -0.96
     Efq
    -0.93
     fevere
    -0.93
     Monfieur
    -0.91
     uſ
    -0.91
     Majefty
    -0.85
     themſelves
    -0.84
    POSITIVE LOGITS
     “
    0.85
     ‘
    0.82
     ...
    0.79
     '
    0.76
    ...
    0.73
     "
    0.73
    ”,
    0.70
    0.69
     …
    0.69
    0.67
    Act Density 1.042%

    No Known Activations