INDEX
    Explanations

    references to specific events or quotes

    expressions of pain or discomfort related to emotional experiences

    New Auto-Interp
    Negative Logits
     unsurprisingly
    -0.62
     moreover
    -0.57
     similarly
    -0.57
    surprisingly
    -0.56
    anwhile
    -0.54
     predictably
    -0.54
     additionally
    -0.52
     meanwhile
    -0.51
     reportedly
    -0.50
     furthermore
    -0.50
    POSITIVE LOGITS
     â̦"
    0.93
    â̦"
    0.90
    ..."
    0.86
    â̦."
    0.82
     ..."
    0.81
     fuckin
    0.73
     gonna
    0.70
    -"
    0.70
    ?"
    0.67
    ?'
    0.66
    Act Density 1.484%

    No Known Activations