INDEX
    Explanations

    explaining problematic jokes

    New Auto-Interp
    Negative Logits
    AMENTE
    0.74
    urie
    0.73
    az
    0.72
    ana
    0.70
    uel
    0.68
    tokens
    0.68
    Token
    0.68
    ann
    0.67
    ă
    0.66
    utral
    0.66
    POSITIVE LOGITS
     cadmium
    0.88
     walkway
    0.85
     bulky
    0.85
     protective
    0.84
     mountainous
    0.82
     prestigious
    0.82
     bustling
    0.82
     infused
    0.81
     prospective
    0.79
     hilltop
    0.79
    Act Density 0.003%

    No Known Activations