INDEX
    Explanations

    pronouns such as "them" and "her," especially when referring to specific actions or objects

    New Auto-Interp
    Negative Logits
    âĢ¢âĢ¢
    -0.66
     Megan
    -0.65
    mire
    -0.62
     Lori
    -0.61
     Fulton
    -0.60
     Salon
    -0.60
    CCC
    -0.60
     LH
    -0.58
     Jon
    -0.58
     Laurie
    -0.57
    POSITIVE LOGITS
    selves
    1.75
    atically
    1.55
     selves
    1.49
    atic
    1.42
    self
    1.42
    alian
    0.94
    atics
    0.94
     conduc
    0.91
     individually
    0.85
    zbollah
    0.83
    Act Density 0.526%

    No Known Activations