INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -1.31
     raiſ
    -1.28
     Reſ
    -1.27
     myſelf
    -1.25
     Monfieur
    -1.24
     Efq
    -1.21
     pleaſure
    -1.18
     tranſ
    -1.15
     Diſ
    -1.15
     ſche
    -1.14
    POSITIVE LOGITS
    .
    0.60
    </em>
    0.56
    ?
    0.52
    ,
    0.50
    </i>
    0.50
    !
    0.49
    :
    0.49
     (
    0.48
     G
    0.47
    yo
    0.46
    Act Density 0.021%

    No Known Activations