INDEX
    Explanations

    internet culture descriptors

    New Auto-Interp
    Negative Logits
     fucking
    0.55
     fuck
    0.48
    🪄
    0.46
     asshole
    0.46
     nigga
    0.46
     prettier
    0.46
     piss
    0.44
    0.44
     charmed
    0.44
    🤎
    0.44
    POSITIVE LOGITS
     potatoes
    0.56
    🥔
    0.54
     Potatoes
    0.52
     potato
    0.50
    potato
    0.49
    Potato
    0.48
     Potato
    0.47
    🦄
    0.46
    🥑
    0.46
    ams
    0.45
    Act Density 0.015%

    No Known Activations