INDEX
    Explanations

    terms related to gender and family structures

    New Auto-Interp
    Negative Logits
    ischen
    -0.20
     Ihren
    -0.18
    enden
    -0.17
    lichen
    -0.17
    ieten
    -0.17
     respectively
    -0.17
    uellen
    -0.17
     Antworten
    -0.17
    genden
    -0.16
    oden
    -0.16
    POSITIVE LOGITS
     erste
    0.25
     kleine
    0.24
     neue
    0.23
     groÃŁe
    0.22
    ige
    0.22
     weitere
    0.21
     deutsche
    0.20
     andere
    0.19
     ganze
    0.19
     perman
    0.19
    Act Density 0.038%

    No Known Activations