INDEX
    Explanations

    references to relationships and gender dynamics, particularly focusing on women and their roles

    female pronouns and women

    New Auto-Interp
    Negative Logits
    UnsafeEnabled
    -0.61
     noDo
    -0.58
    sizeCache
    -0.57
    MigrationBuilder
    -0.56
     виправивши
    -0.55
     ویکی‌آمباردا
    -0.52
    ActionCreators
    -0.51
    -0.51
    僕も
    -0.49
    僕は
    -0.48
    POSITIVE LOGITS
    Obrigada
    0.55
     Women
    0.51
    ค่ะ
    0.48
    Women
    0.48
     Woman
    0.41
    łam
    0.39
     kadın
    0.38
    Beijos
    0.38
    ftagPool
    0.36
    Попис
    0.36
    Act Density 0.280%

    No Known Activations