INDEX
    Explanations

    the third-person object pronoun “them.”

    New Auto-Interp
    Negative Logits
    a
    -0.12
    A
    -0.10
     Nicola
    -0.09
    а
    -0.08
     A
    -0.08
    te
    -0.08
    ar
    -0.08
    ro
    -0.08
    e
    -0.08
     la
    -0.08
    POSITIVE LOGITS
     them
    0.16
     him
    0.15
     Him
    0.13
     HIM
    0.11
     Them
    0.11
     THEM
    0.10
    him
    0.10
    ham
    0.09
    emen
    0.09
     Hem
    0.09
    Act Density 0.082%

    No Known Activations