INDEX
    Explanations

    profane or strong language

    references to curses or swearing

    New Auto-Interp
    Negative Logits
    å§«
    -1.00
    nington
    -0.81
    atican
    -0.80
    parency
    -0.78
    issance
    -0.77
    arnaev
    -0.76
    olitan
    -0.76
    oulos
    -0.76
    anooga
    -0.75
    itutional
    -0.74
    POSITIVE LOGITS
     curse
    0.91
    words
    0.84
     curses
    0.84
     words
    0.78
    hammer
    0.78
     cursing
    0.75
     cursed
    0.73
    bones
    0.70
    word
    0.67
    vine
    0.67
    Act Density 0.036%

    No Known Activations