INDEX
    Explanations

    mentions of LGBTQ-related terms

    references to the LGBTQ community and related topics

    New Auto-Interp
    Negative Logits
     reper
    -0.65
    osaurs
    -0.62
    pered
    -0.61
    gio
    -0.61
     Rove
    -0.60
     Wolver
    -0.60
     scattering
    -0.59
    nings
    -0.59
     respir
    -0.59
    llular
    -0.58
    POSITIVE LOGITS
    Leaks
    0.85
    uably
    0.81
    WER
    0.81
    naire
    0.80
    istani
    0.78
    yan
    0.72
    oman
    0.71
    ecided
    0.71
    erness
    0.71
    endered
    0.70
    Act Density 0.018%

    No Known Activations