INDEX
    Explanations

    mathematical and logical structures or concepts

    New Auto-Interp
    Negative Logits
    ekim
    -0.17
    aidu
    -0.16
    ÃĹ↵↵
    -0.16
    kek
    -0.15
    artz
    -0.15
    cheon
    -0.14
    addy
    -0.14
    quam
    -0.14
    igham
    -0.14
    ãĥ¼ãĥĨ
    -0.13
    POSITIVE LOGITS
     Nat
    0.25
     nat
    0.24
    Nat
    0.22
    nat
    0.20
     destruct
    0.19
     induction
    0.18
     Tactics
    0.18
     tactic
    0.17
    wf
    0.17
     Wells
    0.17
    Act Density 0.003%

    No Known Activations