INDEX
    Explanations

    phrases related to negative portrayals or descriptions involving individuals

    references to characters and their roles in narratives, particularly villains and societal issues

    New Auto-Interp
    Negative Logits
    Companies
    -0.69
    atars
    -0.68
    OY
    -0.65
    notations
    -0.65
     accordingly
    -0.65
    Production
    -0.65
    cu
    -0.65
    videos
    -0.64
    Wire
    -0.64
    Tickets
    -0.64
    POSITIVE LOGITS
     sorts
    0.97
     nowhere
    0.95
     paradise
    0.86
     disguise
    0.85
     whom
    0.83
     whose
    0.82
     steroids
    0.82
     contrasts
    0.79
     nutshell
    0.74
     exile
    0.73
    Act Density 0.295%

    No Known Activations