INDEX
    Explanations

    references to specific places and names, potentially related to historical or cultural contexts

    New Auto-Interp
    Negative Logits
    lect
    -0.84
    htaking
    -0.83
    erm
    -0.81
    BOOK
    -0.78
    sign
    -0.75
    urally
    -0.68
    clud
    -0.68
    galitarian
    -0.68
    elman
    -0.66
    ery
    -0.65
    POSITIVE LOGITS
    aja
    1.00
    ashtra
    0.86
    ÃŃa
    0.83
    oglu
    0.83
    ths
    0.81
     Province
    0.81
    thur
    0.81
     Tsarnaev
    0.74
    azine
    0.74
     Hussain
    0.74
    Act Density 0.086%

    No Known Activations