INDEX
    Explanations

    Math proofs

    New Auto-Interp
    Negative Logits
    dn
    -0.08
    -0.08
     bost
    -0.08
    Agent
    -0.08
     abe
    -0.08
    ))))↵↵
    -0.08
     zuvor
    -0.07
     seben
    -0.07
     sozial
    -0.07
    histor
    -0.07
    POSITIVE LOGITS
     acquaintance
    0.09
    Notation
    0.09
     hereby
    0.08
    Fancy
    0.08
     Our
    0.08
     Fakult
    0.07
     AMC
    0.07
     We're
    0.07
     Arbitr
    0.07
     lemma
    0.07
    Act Density 0.051%

    No Known Activations