INDEX
    Explanations

    the word "norm" followed by a high activation score

    instances of the word "norm" and its variations

    New Auto-Interp
    Negative Logits
     Ashes
    -0.65
    UGH
    -0.65
    hani
    -0.64
     Shades
    -0.61
    OTS
    -0.60
     Tea
    -0.60
    cig
    -0.59
     Kush
    -0.59
     Bowl
    -0.57
    lder
    -0.57
    POSITIVE LOGITS
    ativity
    1.02
    ality
    1.00
    norm
    0.85
    als
    0.84
    atively
    0.77
     norm
    0.76
    ally
    0.76
    mble
    0.74
    alties
    0.74
    essage
    0.73
    Act Density 0.011%

    No Known Activations