INDEX
    Explanations

    phrases related to progress or changes

    recurrent mentions of the word "the" across various contexts

    New Auto-Interp
    Negative Logits
    hops
    -0.87
    atures
    -0.85
     exceeds
    -0.72
    rooms
    -0.71
    ients
    -0.71
    ago
    -0.70
     solves
    -0.69
    chairs
    -0.68
    rates
    -0.68
    eds
    -0.68
    POSITIVE LOGITS
     inability
    1.09
     emergence
    1.08
     tendency
    1.06
     absence
    1.05
     presence
    1.04
     notion
    1.02
     idea
    0.99
     sheer
    0.97
     realization
    0.97
     insistence
    0.97
    Act Density 0.207%

    No Known Activations