INDEX
    Explanations

    references to experiences and changes over time

    New Auto-Interp
    Negative Logits
    lettes
    -0.17
    ispers
    -0.16
    .ir
    -0.15
    ruk
    -0.14
     fast
    -0.14
    ighth
    -0.14
    apons
    -0.14
    esco
    -0.14
     indef
    -0.14
    hort
    -0.14
    POSITIVE LOGITS
     rarity
    0.19
     novelty
    0.18
     nov
    0.18
    orks
    0.17
     breaking
    0.16
    etty
    0.16
    foreign
    0.16
     Breaking
    0.16
     Novel
    0.16
    enga
    0.16
    Act Density 0.144%

    No Known Activations