INDEX
    Explanations

    mentions of women in political contexts

    New Auto-Interp
    Negative Logits
    mol
    -0.17
     Dise
    -0.16
     Herr
    -0.15
    jen
    -0.15
    ãĥIJãĤ¤
    -0.14
    eson
    -0.14
     Draco
    -0.13
    ÏĢη
    -0.13
     Gazette
    -0.13
    ntag
    -0.13
    POSITIVE LOGITS
    asp
    0.15
    ÂłkW
    0.15
    ãĥŃãĥ¼
    0.14
    à¥Įत
    0.13
     <<-
    0.13
    ãĥ¼ãĥĵ
    0.13
    ulin
    0.13
    azor
    0.13
    ŀ
    0.13
    çļĦåľ°
    0.13
    Act Density 0.003%

    No Known Activations