INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleaſure
    -1.06
     itſelf
    -0.98
     Monfieur
    -0.94
     Jefus
    -0.89
    ?),
    -0.89
     ſy
    -0.87
     Majefty
    -0.86
    ?).
    -0.84
     ſche
    -0.84
     muſt
    -0.83
    POSITIVE LOGITS
     be
    0.67
     continue
    0.60
     prepare
    0.59
    awaiter
    0.57
     assist
    0.55
     manage
    0.55
     support
    0.52
     help
    0.52
     not
    0.51
     even
    0.51
    Act Density 0.011%

    No Known Activations