INDEX
    Explanations

    phrases indicating news articles or reports, possibly related to government or official statements

    instances of the word "the."

    New Auto-Interp
    Negative Logits
    books
    -0.79
     resemb
    -0.71
    calling
    -0.68
    reports
    -0.68
    bytes
    -0.67
    makers
    -0.67
    making
    -0.66
     replies
    -0.66
     memes
    -0.65
     indicators
    -0.65
    POSITIVE LOGITS
     concentrate
    0.81
    iling
    0.80
     maximize
    0.78
     complete
    0.78
    bern
    0.77
    rouse
    0.77
    ilet
    0.76
    cca
    0.75
    iler
    0.74
     reach
    0.74
    Act Density 0.000%

    No Known Activations