INDEX
    Explanations

    phrases related to evaluations and judgments about societal norms and personal growth

    New Auto-Interp
    Negative Logits
    iversit
    -0.18
    urgeon
    -0.18
    ÏĢη
    -0.17
    енка
    -0.15
     weakest
    -0.14
    kker
    -0.14
    aviest
    -0.14
    /extensions
    -0.14
    oler
    -0.14
    ardless
    -0.13
    POSITIVE LOGITS
     simply
    0.17
    ken
    0.15
    gone
    0.14
     Simply
    0.14
    ues
    0.14
     Mim
    0.14
    ç
    0.14
    δÏħ
    0.14
    auer
    0.14
    entes
    0.13
    Act Density 0.307%

    No Known Activations