INDEX
    Explanations

    words related to manipulation or coercion

    variations of the word "entertainment" or related terms

    New Auto-Interp
    Negative Logits
     Responsibility
    -0.78
     Spears
    -0.76
    å§«
    -0.75
     Accountability
    -0.70
     Jenner
    -0.70
    BILITIES
    -0.69
     STAR
    -0.68
    pmwiki
    -0.68
    Universal
    -0.66
    士
    -0.65
    POSITIVE LOGITS
    ailed
    1.06
    ourage
    1.00
    rave
    1.00
    renched
    0.99
    inence
    0.96
    ailing
    0.95
    uring
    0.95
     ent
    0.94
    rust
    0.92
    rench
    0.91
    Act Density 0.007%

    No Known Activations