INDEX
    Explanations

    phrases related to social or political issues, laws, and regulations

    references to political ideologies and gender-related beliefs

    New Auto-Interp
    Negative Logits
    wcs
    -0.50
    ensional
    -0.48
    minist
    -0.48
    odcast
    -0.47
     QUEST
    -0.46
     :=
    -0.46
     aback
    -0.45
    ciplinary
    -0.44
    ourning
    -0.43
    ptoms
    -0.43
    POSITIVE LOGITS
    )."
    0.92
    ").
    0.90
    ).[
    0.86
    ]."
    0.83
    %).
    0.81
    ?).
    0.81
    )).
    0.80
    ).
    0.79
    !).
    0.78
    .).
    0.77
    Act Density 4.054%

    No Known Activations