INDEX
    Explanations

    references to prominent historical or cultural figures

    New Auto-Interp
    Negative Logits
    afia
    -0.18
    worm
    -0.16
    istry
    -0.15
    icolor
    -0.15
    次
    -0.15
    ика
    -0.14
    lessly
    -0.14
    lein
    -0.14
    pps
    -0.14
    yles
    -0.14
    POSITIVE LOGITS
     Sym
    0.16
    ess
    0.15
    ISCO
    0.15
    ough
    0.15
    athan
    0.15
    ruc
    0.15
    295
    0.15
    nist
    0.15
    446
    0.15
    oldem
    0.15
    Act Density 0.128%

    No Known Activations