INDEX
    Explanations

    gendered pronouns and titles

    New Auto-Interp
    Negative Logits
     女の子
    0.52
     man
    0.47
    tic
    0.46
     juxtap
    0.46
     pria
    0.46
     src
    0.45
     guy
    0.45
    aan
    0.44
     KCl
    0.44
    男人
    0.43
    POSITIVE LOGITS
    ی
    0.55
     Gender
    0.53
    Gender
    0.53
    男女
    0.48
    ключение
    0.46
    الل
    0.45
     genders
    0.44
    ара
    0.44
     heartbreak
    0.44
    ला
    0.43
    Act Density 0.025%

    No Known Activations