INDEX
    Explanations

    references to Nazi-related terms and historical events

    New Auto-Interp
    Negative Logits
    tis
    -1.17
    Dub
    -1.08
    area
    -1.07
    Asset
    -1.06
    20439
    -1.05
    pring
    -1.04
    clip
    -0.99
    ths
    -0.99
    WHERE
    -0.97
    ional
    -0.97
    POSITIVE LOGITS
     salute
    1.22
     sympath
    1.19
    chwitz
    1.14
     Youth
    1.14
     Hitler
    1.09
     takeover
    1.04
    enthal
    1.01
    ollah
    0.99
    abad
    0.99
    etz
    0.98
    Act Density 0.965%

    No Known Activations