INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -1.02
     iſt
    -0.95
     Anſ
    -0.94
     Monfieur
    -0.94
    BibitemShut
    -0.91
     themſelves
    -0.90
     Houſe
    -0.90
     myſelf
    -0.90
     ſever
    -0.89
    ſelves
    -0.89
    POSITIVE LOGITS
    a
    0.73
    e
    0.63
     C
    0.52
    i
    0.51
    o
    0.51
     forma
    0.49
    ly
    0.48
    tf
    0.47
     id
    0.47
     R
    0.47
    Act Density 1.667%

    No Known Activations