INDEX
    Explanations

    references to the name "John."

    New Auto-Interp
    Negative Logits
    coni
    -0.18
    mland
    -0.16
    enger
    -0.16
    lector
    -0.15
    itious
    -0.14
    lect
    -0.14
    udad
    -0.14
    ence
    -0.14
    andex
    -0.14
    mente
    -0.14
    POSITIVE LOGITS
    athan
    0.27
    nie
    0.20
    sWith
    0.20
    sen
    0.17
    sons
    0.17
    mgr
    0.16
    nr
    0.16
    ning
    0.16
    p
    0.15
    ni
    0.15
    Act Density 0.042%

    No Known Activations