INDEX
    Explanations

    specific numerical values mentioned in the text

    New Auto-Interp
    Negative Logits
    wark
    -0.77
    ACP
    -0.76
    Å
    -0.73
    anu
    -0.71
    itute
    -0.69
    Led
    -0.69
    È
    -0.68
    owered
    -0.68
     Identified
    -0.67
    jured
    -0.66
    POSITIVE LOGITS
     blah
    1.23
     stuff
    1.20
     assorted
    1.00
     maybe
    0.98
     lots
    0.92
     messing
    0.89
     everything
    0.89
     crappy
    0.87
     goodies
    0.86
     shenanigans
    0.86
    Act Density 0.326%

    No Known Activations