INDEX
    Explanations

    mentions of large quantities of entities or activities

    instances of the word "scores" followed by numerical values or references to quantities

    New Auto-Interp
    Negative Logits
    Forge
    -0.67
    ned
    -0.65
     deduction
    -0.63
    ulkan
    -0.62
     dissolution
    -0.62
    UAL
    -0.62
    iator
    -0.61
    Correction
    -0.60
     necessity
    -0.59
    does
    -0.59
    POSITIVE LOGITS
    paces
    0.99
    dozen
    0.93
    poons
    0.93
    imilar
    0.90
     thousand
    0.88
     dozen
    0.83
    omething
    0.82
    arnaev
    0.80
    everal
    0.78
    chool
    0.78
    Act Density 0.016%

    No Known Activations