INDEX
    Explanations

    references to significant historical events or figures related to World War II

    New Auto-Interp
    Negative Logits
    sey
    -0.15
    hani
    -0.15
    ì°°
    -0.15
    ucus
    -0.14
    tic
    -0.14
    _HOT
    -0.14
    ãģ«åIJij
    -0.14
    lisi
    -0.14
    Anc
    -0.14
    nia
    -0.14
    POSITIVE LOGITS
     Hitler
    0.23
    Hit
    0.19
    193
    0.19
     Revision
    0.17
     Nazi
    0.17
     NS
    0.17
     Hit
    0.17
     Blitz
    0.16
    194
    0.16
     Adolf
    0.15
    Act Density 0.099%

    No Known Activations