INDEX
    Explanations

    mentions of family relationships, particularly uncles and aunts

    New Auto-Interp
    Negative Logits
     Daughter
    -0.19
     granddaughter
    -0.19
     Wife
    -0.17
    303
    -0.17
     Sons
    -0.16
     grandson
    -0.16
     Fathers
    -0.16
    413
    -0.15
    476
    -0.15
    妻
    -0.15
    POSITIVE LOGITS
     uncle
    0.49
     Uncle
    0.49
    Unc
    0.49
    unc
    0.47
     Unc
    0.45
     aunt
    0.42
    _unc
    0.39
     Aunt
    0.39
     Cous
    0.35
    åıĶ
    0.35
    Act Density 0.230%

    No Known Activations