INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Laugh
    -0.15
     mockery
    -0.14
     mocked
    -0.12
     amused
    -0.11
     mocking
    -0.10
     Junk
    -0.10
    æĥij
    -0.10
    goog
    -0.09
     laughing
    -0.09
     ridicule
    -0.09
    POSITIVE LOGITS
     kidding
    0.29
     joke
    0.28
     pun
    0.23
     rib
    0.23
     jokes
    0.23
     jest
    0.23
     facet
    0.19
     cracking
    0.18
     wis
    0.18
    pun
    0.18
    Act Density 0.225%

    No Known Activations