INDEX
    Explanations

    phrases related to social issues, particularly gender-related conflicts and political discussions

    New Auto-Interp
    Negative Logits
    .).
    -0.79
    ]).
    -0.74
    ]."
    -0.72
    )).
    -0.69
    }.
    -0.64
    .'"
    -0.62
    ).[
    -0.62
    ].
    -0.60
    )."
    -0.60
    !).
    -0.58
    POSITIVE LOGITS
    ãĥĺãĥ©
    0.56
    izont
    0.50
    ãĥİ
    0.48
    emale
    0.48
    akeru
    0.46
    reens
    0.45
    esides
    0.45
    NAME
    0.43
    renheit
    0.43
    aeda
    0.42
    Act Density 2.279%

    No Known Activations