INDEX
    Explanations

    pronouns followed by verbs

    New Auto-Interp
    Negative Logits
    hips
    -0.62
     Polk
    -0.58
    911
    -0.56
     Friend
    -0.54
    ãĥ¼ãĥĨãĤ£
    -0.54
    quist
    -0.51
     Uni
    -0.50
     Gloria
    -0.49
    friends
    -0.49
     Pearce
    -0.48
    POSITIVE LOGITS
    self
    0.93
    chy
    0.88
    alian
    0.86
    unes
    0.86
    zbollah
    0.83
    iner
    0.77
    ueller
    0.77
    chwitz
    0.70
    asca
    0.70
    achi
    0.68
    Act Density 0.350%

    No Known Activations