INDEX
    Explanations

    instances of the word "first."

    New Auto-Interp
    Negative Logits
     Contributions
    -0.65
     Canaver
    -0.64
    Quantity
    -0.64
    ovie
    -0.62
    ruct
    -0.61
     Gould
    -0.60
    Stat
    -0.59
     SOM
    -0.59
     Doomsday
    -0.59
     Buildings
    -0.58
    POSITIVE LOGITS
     baseman
    1.22
     responders
    1.08
     glance
    1.01
     appeared
    0.92
     blush
    0.82
     foray
    0.79
     glimpse
    0.78
     encountered
    0.78
     encount
    0.77
     flew
    0.76
    Act Density 0.016%

    No Known Activations