INDEX
    Explanations

    phrases that express positive sentiment or approval

    instances of the word "the," indicating a focus on definite references or common nouns in the text

    New Auto-Interp
    Negative Logits
     according
    -0.74
    imi
    -0.73
    inel
    -0.71
     thereby
    -0.69
    Ò
    -0.69
     voluntarily
    -0.68
    âĦ¢:
    -0.68
     âĢº
    -0.67
    âĢł
    -0.67
    ashington
    -0.66
    POSITIVE LOGITS
    oret
    1.50
     easiest
    1.26
     simplest
    1.24
     downside
    1.21
     biggest
    1.21
     slightest
    1.13
    resa
    1.10
     hardest
    1.08
     brightest
    1.08
     quickest
    1.07
    Act Density 0.573%

    No Known Activations