INDEX
    Explanations

    personal pronouns followed by verbs or possessive pronouns

    pronouns referring to individuals

    New Auto-Interp
    Negative Logits
    earch
    -0.72
    arsity
    -0.67
    atlantic
    -0.64
    aughtered
    -0.59
    iaz
    -0.57
     Gulf
    -0.55
    cyclop
    -0.55
    ãĥ¬
    -0.54
    itol
    -0.54
    atory
    -0.53
    POSITIVE LOGITS
    'll
    1.04
    've
    1.03
    'd
    0.96
    're
    0.92
     knew
    0.81
     adore
    0.79
     despise
    0.73
     cannot
    0.73
    self
    0.71
     can
    0.71
    Act Density 0.646%

    No Known Activations