INDEX
    Explanations

    mentions of diversity and related concepts

    references to diversity in various contexts

    New Auto-Interp
    Negative Logits
    ENA
    -0.90
    amina
    -0.84
     CPC
    -0.75
    ש
    -0.72
    RL
    -0.70
    mentioned
    -0.69
    ATA
    -0.69
    hiba
    -0.67
    CHA
    -0.66
    nington
    -0.65
    POSITIVE LOGITS
     Diversity
    1.01
    iveness
    0.95
     diversity
    0.86
     genders
    0.75
    ively
    0.75
     perspectives
    0.75
    ethnic
    0.74
    emale
    0.73
    yip
    0.71
    itarian
    0.70
    Act Density 0.036%

    No Known Activations