INDEX
    Explanations

    references to physical pain or injury

    New Auto-Interp
    Negative Logits
     myſelf
    -1.87
     pleaſure
    -1.85
     Efq
    -1.80
     ſeveral
    -1.79
     Monfieur
    -1.79
     ―――――
    -1.78
     purpoſe
    -1.76
     itſelf
    -1.75
     houſe
    -1.75
     Majefty
    -1.74
    POSITIVE LOGITS
    0.84
    .
    0.80
     K
    0.77
     en
    0.76
     C
    0.75
     R
    0.72
    <eos>
    0.72
    tak
    0.72
     in
    0.70
     V
    0.70
    Act Density 0.110%

    No Known Activations