INDEX
    Explanations

    news-related and crime-related phrases

    New Auto-Interp
    Negative Logits
    tein
    -0.80
     Stall
    -0.73
    orem
    -0.69
     Schne
    -0.67
     Dictionary
    -0.66
     GC
    -0.66
    aceae
    -0.64
     Scale
    -0.64
     Schedule
    -0.63
     simulator
    -0.62
    POSITIVE LOGITS
    politics
    0.80
    news
    0.79
    middle
    0.74
    breaking
    0.71
    dp
    0.68
     NEWS
    0.68
    ontent
    0.67
    "]=>
    0.67
    truth
    0.65
    inion
    0.65
    Act Density 0.138%

    No Known Activations