INDEX
    Explanations

    concepts related to diversity and inclusion

    New Auto-Interp
    Negative Logits
     strengthened
    -0.25
     strengthening
    -0.19
     undermining
    -0.17
     reinforcing
    -0.17
     strengthen
    -0.17
    Uses
    -0.16
     measures
    -0.15
     amplified
    -0.15
    .React
    -0.15
    akening
    -0.14
    POSITIVE LOGITS
     allows
    0.36
     helps
    0.34
     gives
    0.32
     позволÑıеÑĤ
    0.32
     enables
    0.31
    help
    0.31
     makes
    0.30
     means
    0.30
     help
    0.28
    à¸Ĺำà¹ĥห
    0.28
    Act Density 1.137%

    No Known Activations