INDEX
    Explanations

    names of news outlets in the text

    New Auto-Interp
    Negative Logits
    figure
    -0.74
    }}}
    -0.70
     obser
    -0.65
    Interstitial
    -0.63
    gradient
    -0.62
    emort
    -0.61
    taboola
    -0.60
     vulner
    -0.60
    cffffcc
    -0.59
    fig
    -0.57
    POSITIVE LOGITS
     that
    1.08
    that
    0.92
     he
    0.84
     they
    0.84
     there
    0.76
     she
    0.75
    è¦ļéĨĴ
    0.73
     it
    0.69
     THAT
    0.67
     yesterday
    0.65
    Act Density 0.106%

    No Known Activations