INDEX
    Explanations

    politically-related names and terms

    mentions of political figures and related contexts

    New Auto-Interp
    Negative Logits
     Niet
    -0.65
    anwhile
    -0.58
     Hiroshima
    -0.58
    mble
    -0.54
    FUL
    -0.52
    semble
    -0.49
    eatures
    -0.48
     Wem
    -0.48
    tml
    -0.47
    oenix
    -0.46
    POSITIVE LOGITS
    's
    0.76
    Care
    0.63
    care
    0.62
    Semitism
    0.61
    ÃŃs
    0.61
    omics
    0.55
     meddling
    0.54
    ´
    0.53
     anymore
    0.52
    ani
    0.50
    Act Density 0.499%

    No Known Activations