INDEX
    Explanations

    references to wars, particularly World Wars and their specific contexts

    New Auto-Interp
    Negative Logits
     fourth
    -0.15
     fifth
    -0.15
     third
    -0.15
     sixth
    -0.14
     forth
    -0.14
     Stim
    -0.13
    xxxxxxxx
    -0.13
    forth
    -0.13
     unf
    -0.13
     ninth
    -0.13
    POSITIVE LOGITS
     II
    0.27
    II
    0.23
     اÙĦعاÙĦÙħÙĬØ©
    0.19
    (World
    0.19
     Two
    0.18
    Two
    0.17
    âħ
    0.16
    lesia
    0.16
    coma
    0.16
    zimmer
    0.16
    Act Density 0.007%

    No Known Activations