INDEX
    Explanations

    phrases related to comparisons or contrasts

    concepts related to simplicity and significance

    New Auto-Interp
    Negative Logits
    Orig
    -0.66
    packages
    -0.62
    ourses
    -0.61
    eries
    -0.60
    Ns
    -0.60
     Erica
    -0.59
    orig
    -0.59
    ummies
    -0.57
     mysteries
    -0.57
    iaries
    -0.57
    POSITIVE LOGITS
     prolonged
    0.82
     weakening
    0.80
    ealous
    0.79
     influx
    0.77
     unchecked
    0.76
     inaction
    0.76
     curs
    0.76
     glance
    0.76
     reliance
    0.75
    cknowled
    0.72
    Act Density 0.605%

    No Known Activations