INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .
    -0.49
    :
    -0.48
     of
    -0.46
    '
    -0.45
    ?
    -0.44
    -0.44
    (
    -0.42
    ,
    -0.42
     B
    -0.41
    !
    -0.41
    POSITIVE LOGITS
    OGND
    1.02
     Efq
    1.02
     purpoſe
    0.99
     myſelf
    0.96
     pleaſure
    0.94
     Monfieur
    0.93
     faſt
    0.92
     fevere
    0.90
     houſe
    0.90
    ſelf
    0.89
    Act Density 0.061%

    No Known Activations