INDEX
    Explanations

    mentions of statements made by individuals

    instances of the word "said."

    New Auto-Interp
    Negative Logits
    =~=~
    -0.86
    asu
    -0.82
    ptives
    -0.77
    EDIT
    -0.75
    folios
    -0.71
    pleting
    -0.71
    ntil
    -0.66
    à¦
    -0.65
    EMBER
    -0.65
    ernels
    -0.64
    POSITIVE LOGITS
     goodbye
    1.24
    doms
    0.86
     hello
    0.83
     aloud
    0.80
    mith
    0.76
     anecd
    0.71
     afterward
    0.69
     Goodbye
    0.69
    ieu
    0.68
     they
    0.67
    Act Density 0.231%

    No Known Activations