INDEX
    Explanations

    mentions of specific celebrities, particularly Beyoncé and Rihanna

    New Auto-Interp
    Negative Logits
    ugu
    -0.73
     hemor
    -0.60
     positional
    -0.60
    igun
    -0.59
    Assembly
    -0.58
    ategory
    -0.58
     rink
    -0.58
    rongh
    -0.58
    ictionary
    -0.57
     deduction
    -0.57
    POSITIVE LOGITS
    cé
    1.51
     Beyon
    0.98
    ce
    0.98
    gments
    0.92
    nect
    0.91
    issance
    0.87
    kees
    0.85
    bird
    0.83
    ciples
    0.82
    tics
    0.82
    Act Density 0.003%

    No Known Activations