INDEX
    Explanations

    mentions of the word "sister" at high activations

    references to siblings, specifically sisters

    New Auto-Interp
    Negative Logits
    veyard
    -0.74
    ered
    -0.68
    ustomed
    -0.67
    Frames
    -0.67
    ocalypse
    -0.63
    tarians
    -0.63
    ankind
    -0.62
    upuncture
    -0.61
    urations
    -0.61
    atility
    -0.61
    POSITIVE LOGITS
    hood
    1.11
     sister
    0.94
    hips
    0.92
    heses
    0.89
     sisters
    0.79
    hesis
    0.79
    folk
    0.79
    fax
    0.75
     aunt
    0.73
     Sister
    0.73
    Act Density 0.013%

    No Known Activations