INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
    3
    -0.07
     Strategy
    -0.07
     Card
    -0.07
     something
    -0.07
    -0.07
    -0.07
     bow
    -0.07
    9
    -0.07
     hooks
    -0.07
    POSITIVE LOGITS
    0.09
    0.08
     Roger
    0.08
     Carolyn
    0.08
     Suzanne
    0.07
     Plymouth
    0.07
     Gonzalez
    0.07
    lover
    0.07
    Roger
    0.07
    0.07
    Act Density 0.104%

    No Known Activations