INDEX
    Explanations

    articles and quantifiers in the text

    New Auto-Interp
    Negative Logits
    tests
    -0.99
    Events
    -0.97
    alties
    -0.94
    grounds
    -0.92
    iments
    -0.88
    votes
    -0.88
    agents
    -0.84
    rates
    -0.82
    words
    -0.82
    Init
    -0.82
    POSITIVE LOGITS
     silhouette
    1.09
     bunch
    1.05
     replica
    1.03
     swast
    1.02
     glimpse
    1.00
     plethora
    1.00
     miniature
    1.00
     cardboard
    1.00
     handful
    0.99
     suitcase
    0.98
    Act Density 0.255%

    No Known Activations