INDEX
    Explanations

    the word "norm" with varying degrees of activation

    references to societal or cultural standards

    New Auto-Interp
    Negative Logits
     Ashes
    -0.70
    UGH
    -0.64
    hani
    -0.60
     Khe
    -0.60
     Kush
    -0.60
    Package
    -0.60
    pta
    -0.60
     Bowl
    -0.59
     Conspiracy
    -0.58
     Chargers
    -0.57
    POSITIVE LOGITS
    ality
    1.13
    ativity
    1.12
    als
    1.06
    atively
    0.94
    ally
    0.91
    heastern
    0.84
    norm
    0.82
    uses
    0.79
    alties
    0.79
    heast
    0.79
    Act Density 0.010%

    No Known Activations