INDEX
    Explanations

    topics related to social issues, especially in relation to gender, politics, and popular culture

    New Auto-Interp
    Negative Logits
    ï¼¥
    -0.17
    ãĤ¨
    -0.17
     Dear
    -0.17
    _dw
    -0.17
    -E
    -0.17
     ÐĶ
    -0.15
    _e
    -0.15
    	E
    -0.15
    _E
    -0.15
    -e
    -0.15
    POSITIVE LOGITS
     G
    0.16
     Gibson
    0.16
    ÂłG
    0.15
     Gan
    0.15
    ÂłF
    0.15
     F
    0.15
    G
    0.14
    ãĥ³ãĥķ
    0.14
    asje
    0.14
    ghi
    0.14
    Act Density 0.032%

    No Known Activations