INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     myſelf
    -1.55
     itſelf
    -1.52
     Efq
    -1.48
     ſeveral
    -1.38
     himſelf
    -1.34
     leaſt
    -1.34
     Monfieur
    -1.34
     houſe
    -1.33
     pleaſure
    -1.33
     themſelves
    -1.27
    POSITIVE LOGITS
     and
    0.65
     &
    0.56
    0.54
     in
    0.53
    ,
    0.47
    and
    0.47
    0.44
     to
    0.44
     (
    0.43
    /
    0.43
    Act Density 0.042%

    No Known Activations