INDEX
    Explanations

    names or references to prominent individuals or figures

    New Auto-Interp
    Negative Logits
     extremes
    -0.15
     Vern
    -0.15
     Epid
    -0.14
    chn
    -0.14
    DAL
    -0.14
    ses
    -0.14
    emple
    -0.14
     châu
    -0.14
    /share
    -0.14
    fal
    -0.13
    POSITIVE LOGITS
    peare
    0.19
    زادÙĩ
    0.17
    baz
    0.16
    256
    0.16
    apult
    0.15
    akespeare
    0.15
    iÃŃ
    0.15
    eldon
    0.15
    afs
    0.15
    eneg
    0.15
    Act Density 0.024%

    No Known Activations