INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,
    -0.55
     at
    -0.53
    .
    -0.53
     of
    -0.50
    :
    -0.50
     on
    -0.48
     for
    -0.48
     that
    -0.46
     in
    -0.46
     to
    -0.45
    POSITIVE LOGITS
    ſelves
    1.00
     Drapeau
    0.98
     itſelf
    0.93
     myſelf
    0.90
     doubtnut
    0.88
     Shakspeare
    0.87
     Parthen
    0.86
     poffible
    0.86
     ſeveral
    0.85
     Jefus
    0.85
    Act Density 0.036%

    No Known Activations