INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    cheng
    -0.07
     pawn
    -0.07
     hers
    -0.07
     Darwin
    -0.07
     Aaron
    -0.07
    "No
    -0.07
     Allen
    -0.07
     Pawn
    -0.07
    YES
    -0.07
     ys
    -0.06
    POSITIVE LOGITS
     Spect
    0.15
     spect
    0.13
     spectator
    0.11
     spectators
    0.11
     spectacle
    0.11
    spect
    0.11
     spectral
    0.09
     spectacular
    0.09
    pectral
    0.08
     spectro
    0.08
    Act Density 0.009%

    No Known Activations