INDEX
    Explanations

    references to gender roles and societal expectations

    New Auto-Interp
    Negative Logits
    )"),
    -0.62
    modelBuilder
    -0.62
    /−
    -0.58
    >>();
    -0.56
    ModelSerializer
    -0.55
     protoimpl
    -0.54
    ]]:
    -0.51
    '};
    -0.47
     kaynağından
    -0.47
     ().
    -0.46
    POSITIVE LOGITS
     تضيفلها
    0.68
     tbh
    0.65
     anyway
    0.65
     too
    0.61
     مرئيه
    0.60
     lol
    0.60
    EndContext
    0.59
     anyhow
    0.59
     متعلقه
    0.57
    ftagPool
    0.56
    Act Density 0.349%

    No Known Activations