INDEX
    Explanations

    phrases related to controversial or sensitive topics, such as LGBTQ+ rights, political issues, and social justice

    references to individuals and their personal connections or identities

    New Auto-Interp
    Head Attr Weights
    0:0.12
    1:0.03
    2:0.12
    3:0.14
    4:0.06
    5:0.09
    6:0.04
    7:0.04
    8:0.06
    9:0.11
    10:0.08
    11:0.05
    Negative Logits
    ��
    -1.24
    etheus
    -1.17
     skelet
    -1.15
    Published
    -1.05
     Spiegel
    -1.04
    isphere
    -1.04
    PDATE
    -1.03
    GBT
    -1.02
    FontSize
    -1.00
    PsyNetMessage
    -1.00
    POSITIVE LOGITS
    flies
    1.21
    illin
    1.18
     Chicken
    1.18
    oglu
    1.18
     Farms
    1.11
     Drops
    1.09
    ynski
    1.08
    uan
    1.07
    ille
    1.06
     Slime
    1.06
    Act Density 0.036%

    No Known Activations