INDEX
    Explanations

    phrases related to emotions and personal experiences

    expressions of strong emotions or statements of clarity

    New Auto-Interp
    Negative Logits
    REDACTED
    -0.63
    ãĢij
    -0.62
    îĢ
    -0.60
     ().
    -0.58
    shall
    -0.57
    .*
    -0.57
    ().
    -0.56
    etheless
    -0.54
    NOW
    -0.54
    .(
    -0.54
    POSITIVE LOGITS
     [
    0.98
    ,"
    0.93
    ,'"
    0.87
    ),"
    0.86
    ,''
    0.77
    .,"
    0.75
     everybody
    0.72
     ['
    0.72
     somebody
    0.67
    ,'
    0.66
    Act Density 1.242%

    No Known Activations