INDEX
    Explanations

    words that stand out or are emphasized in a text

    mention of the word "words" in various contexts

    New Auto-Interp
    Negative Logits
    DERR
    -0.75
    izo
    -0.73
    ramid
    -0.72
    roxy
    -0.69
    enture
    -0.67
    olls
    -0.65
     cumbers
    -0.65
    ño
    -0.64
    romeda
    -0.63
     notor
    -0.63
    POSITIVE LOGITS
    mith
    1.54
     spoken
    1.11
     uttered
    1.03
     aloud
    0.89
    speak
    0.87
    press
    0.86
     words
    0.84
    poons
    0.82
    sworth
    0.82
    words
    0.78
    Act Density 0.023%

    No Known Activations