INDEX
    Explanations

    references to specific years and events in history

    New Auto-Interp
    Negative Logits
     поба
    -0.19
    urum
    -0.18
     завиÑģим
    -0.17
    reich
    -0.16
    ignum
    -0.15
     eldre
    -0.14
    macros
    -0.14
    unning
    -0.14
    avaÅŁ
    -0.14
     Lucas
    -0.14
    POSITIVE LOGITS
     na
    0.28
     tu
    0.25
     w
    0.19
     Tu
    0.19
    Tu
    0.18
     przez
    0.18
     już
    0.18
     po
    0.17
    tu
    0.17
     Na
    0.17
    Act Density 0.039%

    No Known Activations