INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ing
    -0.52
    es
    -0.52
    <eos>
    -0.51
    ker
    -0.49
    g
    -0.48
    ers
    -0.45
    er
    -0.45
    ent
    -0.45
    vern
    -0.45
    ently
    -0.45
    POSITIVE LOGITS
     Majefty
    1.17
    ^(@)
    1.08
     Theſe
    1.05
     Anſ
    1.04
     itſelf
    1.01
     Efq
    1.00
     myſelf
    0.98
    ſelf
    0.96
     doubtnut
    0.95
     Houſe
    0.93
    Act Density 0.520%

    No Known Activations