INDEX
    Explanations

    mentions of the word "father."

    mentions of the word "father."

    New Auto-Interp
    Negative Logits
    ample
    -0.70
    pots
    -0.65
    enda
    -0.65
    waves
    -0.63
    adas
    -0.61
    boxes
    -0.61
    ways
    -0.60
    ably
    -0.59
    yz
    -0.58
     Redux
    -0.58
    POSITIVE LOGITS
     father
    3.45
     dad
    2.73
     fathers
    2.58
     grandfather
    2.33
     mother
    2.32
     Father
    2.17
    Father
    2.16
    father
    2.10
     dads
    2.10
     parents
    2.06
    Act Density 0.021%

    No Known Activations