INDEX
    Explanations

    names of individuals, particularly with the first name "Thomas" or "Tom."

    New Auto-Interp
    Negative Logits
    ouz
    -0.16
    ToLocal
    -0.15
    enta
    -0.14
    ynes
    -0.14
    ipse
    -0.14
    elyn
    -0.14
    obao
    -0.14
    idlo
    -0.14
    nds
    -0.13
    cbc
    -0.13
    POSITIVE LOGITS
    -chan
    0.16
    ors
    0.15
    uke
    0.15
    нав
    0.15
    er
    0.15
    zilla
    0.14
    uto
    0.14
    echa
    0.14
    egal
    0.14
     å±
    0.14
    Act Density 0.019%

    No Known Activations