INDEX
    Explanations

    references to controversies or conflicts

    expressions of concern or questions about societal issues

    New Auto-Interp
    Negative Logits
    TAG
    -0.70
    ,''
    -0.69
    bole
    -0.68
    agree
    -0.66
    \)
    -0.64
     ®
    -0.64
    tion
    -0.63
    +)
    -0.63
    Spoiler
    -0.63
    ================================
    -0.62
    POSITIVE LOGITS
     sushi
    0.75
     Okin
    0.73
    phalt
    0.73
     dentist
    0.73
     Golf
    0.70
     Chilean
    0.66
     eyebrow
    0.66
     Hamb
    0.66
     Tos
    0.65
     eyel
    0.64
    Act Density 1.410%

    No Known Activations