INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    obooks
    -0.08
    !='
    -0.08
    mers
    -0.07
    Tomorrow
    -0.07
    -0.07
    .can
    -0.07
     endeavor
    -0.07
    man
    -0.07
    cipline
    -0.07
    рас
    -0.07
    POSITIVE LOGITS
     defective
    0.09
     ergo
    0.09
    SO
    0.08
     potenci
    0.08
     owo
    0.08
     SO
    0.08
     hence
    0.08
     BUT
    0.08
     Sinn
    0.07
     fraudulent
    0.07
    Act Density 0.012%

    No Known Activations