INDEX
    Explanations

    mentions of the word "news" at high activation levels

    words and phrases related to news and reporting

    New Auto-Interp
    Negative Logits
     mutually
    -0.65
    istically
    -0.64
     rapists
    -0.64
     rapist
    -0.62
     goodwill
    -0.62
     Reconstruction
    -0.62
     unarmed
    -0.61
     disarm
    -0.60
    sighted
    -0.59
    uras
    -0.59
    POSITIVE LOGITS
    chool
    1.02
    ews
    1.00
    ource
    0.96
     VIDEOS
    0.91
    hower
    0.89
    peed
    0.89
    atcher
    0.86
    ystem
    0.86
    ashington
    0.86
    velt
    0.85
    Act Density 0.006%

    No Known Activations