INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    A
    0.77
    ل
    0.70
     
    0.66
    ")
    0.63
    т
    0.59
    Mun
    0.59
    Were
    0.58
    F
    0.54
     were
    0.54
    Moon
    0.53
    POSITIVE LOGITS
     mothers
    0.72
     dads
    0.71
    רי
    0.71
     moms
    0.71
     стаўкі
    0.70
    lių
    0.68
    li
    0.65
    mothers
    0.64
     WOMEN
    0.64
     PROBLEM
    0.63
    Act Density 0.001%

    No Known Activations