INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Efq
    -1.19
     Theſe
    -0.99
    AndEndTag
    -0.97
     myſelf
    -0.96
     ―――――
    -0.96
     Monfieur
    -0.93
     Jefus
    -0.90
     themſelves
    -0.86
     Anſ
    -0.85
     ſeveral
    -0.85
    POSITIVE LOGITS
    i
    0.81
    ies
    0.59
    ii
    0.58
    ip
    0.50
    ist
    0.49
    ed
    0.49
    ik
    0.47
    ib
    0.47
    ih
    0.46
    ian
    0.46
    Act Density 0.062%

    No Known Activations