INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rrggbb
    -0.60
     Whitby
    -0.57
    alsey
    -0.53
    }>;
    -0.49
     Maritime
    -0.46
     CPO
    -0.44
     Clearwater
    -0.44
     Baran
    -0.44
    vallis
    -0.44
    guous
    -0.43
    POSITIVE LOGITS
     joke
    1.00
     Joke
    0.99
    joke
    0.96
    Joke
    0.96
     jokes
    0.91
    Jokes
    0.89
     broma
    0.85
     Jokes
    0.83
     joking
    0.79
    jokes
    0.77
    Act Density 0.003%

    No Known Activations