INDEX
    Explanations

    references to contentious political issues and debates

    New Auto-Interp
    Negative Logits
    gra
    -0.15
    /popper
    -0.14
     vertically
    -0.14
    еÑĨ
    -0.14
     pud
    -0.14
     Smile
    -0.14
     Vall
    -0.14
    äch
    -0.14
    OUCH
    -0.14
    oring
    -0.13
    POSITIVE LOGITS
     gif
    0.18
    gif
    0.18
    aya
    0.17
    DAT
    0.16
     gifs
    0.16
    ternet
    0.15
     DERP
    0.14
    ube
    0.14
    efon
    0.14
     shim
    0.14
    Act Density 1.038%

    No Known Activations