INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -zone
    -0.07
    bject
    -0.07
    -cluster
    -0.07
    ale
    -0.07
    acent
    -0.07
    _Abstract
    -0.06
    asant
    -0.06
     sisters
    -0.06
    -packages
    -0.06
     shareholder
    -0.06
    POSITIVE LOGITS
     proof
    0.16
     Proof
    0.14
     proofs
    0.13
    proof
    0.11
    Proof
    0.11
    -proof
    0.09
     Boo
    0.07
     Wolf
    0.07
     kein
    0.07
    pf
    0.07
    Act Density 0.004%

    No Known Activations