INDEX
    Explanations

    mentions of specific names

    New Auto-Interp
    Negative Logits
     mothers
    -0.10
     Mothers
    -0.10
     mate
    -0.10
     himself
    -0.10
    大家
    -0.10
     fleets
    -0.09
     collabor
    -0.09
     author
    -0.09
     dude
    -0.09
     granddaughter
    -0.09
    POSITIVE LOGITS
     duo
    0.42
     two
    0.36
    äºĮ人
    0.32
    两个
    0.32
     pair
    0.31
    two
    0.29
    两人
    0.28
     Duo
    0.27
     Two
    0.26
    Two
    0.25
    Act Density 0.358%

    No Known Activations