INDEX
    Explanations

    titles of articles or pieces of writing

    references to publications and their titles

    New Auto-Interp
    Negative Logits
    unts
    -0.66
    ordinate
    -0.64
     trophy
    -0.63
     handc
    -0.62
    activation
    -0.61
     stret
    -0.61
     ambul
    -0.60
    zers
    -0.59
     perimeter
    -0.59
     standby
    -0.59
    POSITIVE LOGITS
     excerpts
    0.98
     essays
    0.93
    published
    0.87
    blogs
    0.82
     blogs
    0.82
     satirical
    0.81
     plagiar
    0.80
    articles
    0.79
     blog
    0.78
     commentary
    0.78
    Act Density 0.574%

    No Known Activations