INDEX
    Explanations

    mentions of humor or humorous content

    New Auto-Interp
    Negative Logits
    649
    -0.18
    istrovstvÃŃ
    -0.17
    STALL
    -0.16
    FOUNDATION
    -0.16
    allen
    -0.15
    hips
    -0.15
    iner
    -0.15
     hrd
    -0.14
    lew
    -0.14
    ideo
    -0.14
    POSITIVE LOGITS
     hum
    0.25
     Hum
    0.24
    Hum
    0.21
    pty
    0.21
    mers
    0.21
    oldt
    0.19
    hum
    0.18
    ankind
    0.17
    iliate
    0.17
    ricane
    0.17
    Act Density 0.018%

    No Known Activations