INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     trophies
    -0.06
    -0.06
     reduction
    -0.06
    enuous
    -0.06
     cosine
    -0.06
     animated
    -0.06
     dancers
    -0.06
     teenagers
    -0.06
    _And
    -0.06
     hairs
    -0.06
    POSITIVE LOGITS
     Butt
    0.07
    175
    0.06
    #----------------------------------------------------------------------------
    0.06
    iplina
    0.06
    """),↵
    0.06
    215
    0.06
     Simmons
    0.06
    TestingModule
    0.06
    765
    0.06
     ini
    0.06
    Act Density 0.004%

    No Known Activations