INDEX
    Explanations

    mentions of the word "who."

    New Auto-Interp
    Negative Logits
    scan
    -0.16
    mente
    -0.15
    andon
    -0.15
    stick
    -0.15
    abbo
    -0.15
    icens
    -0.15
    type
    -0.15
    raf
    -0.14
    tti
    -0.14
    tt
    -0.14
    POSITIVE LOGITS
    oping
    0.30
    oped
    0.23
    ever
    0.20
    ops
    0.17
    upon
    0.17
    soever
    0.17
    /if
    0.16
    onto
    0.16
    osh
    0.15
    ’d
    0.15
    Act Density 0.129%

    No Known Activations