INDEX
    Explanations

    instances of humor and jokes

    New Auto-Interp
    Negative Logits
    olstadt
    -0.52
    следова
    -0.50
    sc
    -0.50
    en
    -0.50
     zab
    -0.49
     zb
    -0.49
     HAPP
    -0.48
     happy
    -0.48
     teng
    -0.48
     random
    -0.47
    POSITIVE LOGITS
    joke
    1.13
     joke
    1.12
     Joke
    1.11
     joking
    1.08
     jokes
    1.02
     Jokes
    0.98
    jokes
    0.98
    Joke
    0.98
    Jokes
    0.90
     joked
    0.85
    Act Density 0.008%

    No Known Activations