INDEX
    Explanations

    references to gender attitudes and equality in society

    New Auto-Interp
    Negative Logits
    TestingModule
    -0.94
     Efq
    -0.87
     myſelf
    -0.83
     ſever
    -0.83
     Reſ
    -0.82
     Anſ
    -0.81
     purpoſe
    -0.80
    \{\\
    -0.79
    GEBURTSDATUM
    -0.79
     kasarigan
    -0.78
    POSITIVE LOGITS
     dotyczą
    0.72
     topics
    0.72
     titled
    0.69
     topic
    0.67
     tentang
    0.67
     about
    0.64
     regarding
    0.63
    关于
    0.62
     "
    0.61
    Topic
    0.61
    Act Density 0.754%

    No Known Activations