INDEX
    Explanations

    expressions of pride or self-importance related to achievements or identity

    New Auto-Interp
    Negative Logits
    merce
    -0.15
    ofday
    -0.15
    uran
    -0.15
     Fuj
    -0.15
    oran
    -0.14
    onth
    -0.14
    laws
    -0.14
    eki
    -0.14
    oth
    -0.14
    allon
    -0.13
    POSITIVE LOGITS
    ably
    0.16
    uzzi
    0.16
     PoÄįet
    0.14
     proud
    0.14
    ór
    0.14
     Fior
    0.14
    Ïģια
    0.14
    æºĸ
    0.14
     mantle
    0.14
     unwrap
    0.14
    Act Density 0.035%

    No Known Activations