INDEX
    Explanations

    expressions of wonder and appreciation in various contexts

    New Auto-Interp
    Negative Logits
     nice
    -0.26
    nice
    -0.19
    Nice
    -0.19
     interesting
    -0.19
     Ñĥда
    -0.18
     pleasant
    -0.18
     fine
    -0.17
    interesting
    -0.17
     attractive
    -0.17
     Nice
    -0.17
    POSITIVE LOGITS
     simply
    0.27
     jaw
    0.26
     beyond
    0.25
    phen
    0.24
     Amazing
    0.23
     mind
    0.23
     Phen
    0.23
    jaw
    0.23
     phen
    0.23
     incredible
    0.23
    Act Density 0.307%

    No Known Activations