INDEX
    Explanations

    specific mentions or occurrences of the word "first" followed by a number

    New Auto-Interp
    Negative Logits
    tics
    -0.75
    uba
    -0.71
    hawks
    -0.70
    iths
    -0.67
    ans
    -0.64
    ernels
    -0.63
    oops
    -0.62
    iversity
    -0.61
    iences
    -0.60
    today
    -0.60
    POSITIVE LOGITS
     layer
    1.13
     iteration
    1.13
     paragraph
    1.10
     step
    1.08
     element
    1.07
     section
    1.05
     subparagraph
    1.03
     tier
    1.01
    most
    1.00
     half
    0.99
    Act Density 0.137%

    No Known Activations