INDEX
    Explanations

    mentions of familial relationships, specifically relating to aunts and nieces

    New Auto-Interp
    Negative Logits
    _
    
    -0.57
    }}
    
    -0.50
    }$}
    -0.48
     }
    
    -0.48
     }}$}
    -0.47
     Nemo
    -0.47
    })()
    -0.47
    },'
    -0.47
    <_>
    -0.47
    ]</
    -0.46
    POSITIVE LOGITS
     aunt
    1.83
     Aunt
    1.74
    Aunt
    1.66
     aunts
    1.58
    aunt
    1.17
     Auntie
    0.96
    Aun
    0.95
     tía
    0.91
     aun
    0.89
     grandmother
    0.88
    Act Density 0.002%

    No Known Activations