INDEX
    Explanations

    code package declarations

    New Auto-Interp
    Negative Logits
     itſelf
    -1.18
     ſind
    -1.07
     myſelf
    -1.05
    ſelves
    -1.01
     himſelf
    -0.97
    ſelf
    -0.97
     ſever
    -0.97
     iſt
    -0.95
     faſt
    -0.94
     Efq
    -0.93
    POSITIVE LOGITS
    ↵↵↵
    0.75
     {
    0.71
    ↵↵
    0.70
    ↵↵↵↵
    0.68
     name
    0.68
    0.64
    ↵↵↵↵↵
    0.62
     main
    0.59
    ↵↵↵↵↵↵↵
    0.59
    <eos>
    0.59
    Act Density 0.151%

    No Known Activations