INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -1.20
     Monfieur
    -1.09
     pleaſure
    -1.06
     themſelves
    -1.00
    <bos>
    -0.97
     Shakspeare
    -0.95
     purpoſe
    -0.94
     Efq
    -0.94
     itſelf
    -0.93
     himſelf
    -0.92
    POSITIVE LOGITS
     }^{
    0.47
    ]<=
    0.47
     }_{
    0.46
    ној
    0.45
    0.44
    ocer
    0.43
    0.43
    __[
    0.42
    <eos>
    0.41
     thức
    0.41
    Act Density 0.264%

    No Known Activations