INDEX
    Explanations

    words that convey emotions or strong emphasis

    occurrences of the word "words."

    New Auto-Interp
    Negative Logits
    DERR
    -0.80
    olls
    -0.75
    izo
    -0.72
    ramid
    -0.71
    vy
    -0.68
    ño
    -0.67
    awaru
    -0.64
    roid
    -0.62
     cumbers
    -0.62
     millenn
    -0.62
    POSITIVE LOGITS
    mith
    1.44
     spoken
    0.99
     uttered
    0.96
    terday
    0.87
     words
    0.86
    poons
    0.85
    speak
    0.81
     aloud
    0.81
    words
    0.79
    writers
    0.79
    Act Density 0.021%

    No Known Activations