INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bon
    -0.08
    cou
    -0.08
    Bon
    -0.08
     destructive
    -0.07
     Dj
    -0.07
     footing
    -0.07
    wired
    -0.07
    IC
    -0.07
     eros
    -0.07
    !?
    -0.07
    POSITIVE LOGITS
     Alexander
    0.09
     tha
    0.08
     academy
    0.08
    bx
    0.08
     Lee
    0.07
     Proven
    0.07
    0.07
    .​
    0.07
    Tel
    0.07
    maid
    0.07
    Act Density 0.001%

    No Known Activations