INDEX
    Explanations

    instances of the word "who."

    New Auto-Interp
    Negative Logits
    robat
    -0.20
    ted
    -0.17
     Darling
    -0.17
    bian
    -0.16
    mented
    -0.15
    bol
    -0.15
    cline
    -0.15
    aises
    -0.15
    nt
    -0.15
    net
    -0.14
    POSITIVE LOGITS
    ops
    0.30
    ever
    0.30
     else
    0.28
    opi
    0.26
    oping
    0.26
    osh
    0.25
     am
    0.23
    op
    0.22
    opsy
    0.22
    ope
    0.21
    Act Density 0.025%

    No Known Activations