INDEX
    Explanations

    instances of the word "mean" with high activation values

    references to statistical measures, particularly the term "mean."

    New Auto-Interp
    Negative Logits
    dfx
    -0.85
    DOM
    -0.81
    taboola
    -0.77
    ASED
    -0.75
    @#&
    -0.74
    thumbnails
    -0.73
    Newsletter
    -0.72
    conservancy
    -0.70
    anon
    -0.70
    UNCH
    -0.70
    POSITIVE LOGITS
     spirited
    0.92
    ings
    0.80
    erest
    0.76
    ingly
    0.73
    ity
    0.71
    est
    0.69
    ework
    0.68
    ening
    0.68
    ness
    0.67
    eway
    0.66
    Act Density 0.018%

    No Known Activations