INDEX
    Explanations

    words related to the evaluation of social norms and expectations regarding women's roles

    New Auto-Interp
    Negative Logits
    etc
    -0.39
     подоб
    -0.38
    forChild
    -0.38
     Ordin
    -0.38
     eikä
    -0.38
    Etc
    -0.37
     Etc
    -0.36
    -0.36
     sice
    -0.36
    offline
    -0.36
    POSITIVE LOGITS
     sebaliknya
    0.97
     наоборот
    0.85
     juist
    0.85
     conversely
    0.82
     justru
    0.78
     downright
    0.75
     inkább
    0.75
     malah
    0.74
     vielmehr
    0.74
    それとも
    0.73
    Act Density 0.881%

    No Known Activations