INDEX
    Explanations

    verbs indicating someone speaking or providing information

    instances of quotation marks or reported speech

    New Auto-Interp
    Negative Logits
    tumblr
    -0.77
    LET
    -0.72
    arest
    -0.70
    illions
    -0.70
    Holy
    -0.69
    isable
    -0.69
    respective
    -0.68
    pe
    -0.66
    Pages
    -0.66
    TABLE
    -0.65
    POSITIVE LOGITS
     bluntly
    0.83
     sarcast
    0.82
     anecd
    0.81
     afterward
    0.78
     goodbye
    0.72
    heit
    0.70
     emphatically
    0.70
    essler
    0.69
     KR
    0.69
    doms
    0.68
    Act Density 0.175%

    No Known Activations