INDEX
    Explanations

    specific words in a text, focusing on words rather than the context or structure of the sentences

    New Auto-Interp
    Negative Logits
    DERR
    -0.96
    roxy
    -0.83
    taboola
    -0.77
    ©¶æ¥µ
    -0.75
    psey
    -0.75
    abama
    -0.72
    ersen
    -0.72
    rero
    -0.71
    ahon
    -0.70
     Democr
    -0.70
    POSITIVE LOGITS
    mith
    1.17
    sworth
    1.08
     ptr
    0.93
    ifier
    0.89
    press
    0.82
     diction
    0.80
     phrases
    0.79
     uttered
    0.79
    words
    0.79
    processor
    0.78
    Act Density 0.040%

    No Known Activations