INDEX
    Explanations

    seem to be numerical patterns or sequences

    instances of the word "order" used in various contexts

    New Auto-Interp
    Negative Logits
    vae
    -0.88
    peria
    -0.78
    ipedia
    -0.77
    lasses
    -0.75
    rities
    -0.75
    reath
    -0.73
    espie
    -0.72
    abies
    -0.71
    attery
    -0.69
    practice
    -0.69
    POSITIVE LOGITS
    lies
    1.29
    liness
    1.20
    eering
    0.92
    book
    0.83
     cancell
    0.81
     fulfillment
    0.79
    books
    0.73
     Mant
    0.72
     issued
    0.72
     fulfilled
    0.72
    Act Density 0.053%

    No Known Activations