INDEX
    Explanations

    references to news outlets and media sources

    New Auto-Interp
    Negative Logits
     veter
    -0.67
    animate
    -0.64
     diaper
    -0.59
     harms
    -0.56
     causal
    -0.55
    atible
    -0.54
     surgical
    -0.54
     parity
    -0.54
    pires
    -0.54
     doesnt
    -0.52
    POSITIVE LOGITS
    .
    0.84
     quoted
    0.77
    .</
    0.76
     rhet
    0.74
     quoting
    0.73
    .]
    0.70
    .).
    0.69
     sarcast
    0.69
    ."
    0.67
    lied
    0.67
    Act Density 0.145%

    No Known Activations