INDEX
    Explanations

    mentions of genders and relationships between boys and girls

    New Auto-Interp
    Negative Logits
    aker
    -0.19
    wich
    -0.18
    emen
    -0.17
    ITTER
    -0.16
    ermen
    -0.16
    estre
    -0.16
    erman
    -0.15
    itter
    -0.15
    .psi
    -0.15
     Niet
    -0.14
    POSITIVE LOGITS
    friend
    0.22
     Scout
    0.22
     Scouts
    0.21
     scout
    0.21
    hood
    0.20
    Friend
    0.19
    cott
    0.19
    friends
    0.19
     scouts
    0.18
     freund
    0.17
    Act Density 0.022%

    No Known Activations