INDEX
    Explanations

    phrases related to seeking help or intervention in a crisis

    statements emphasizing change or improvement in circumstances

    New Auto-Interp
    Negative Logits
    ãĤ´ãĥ³
    -0.61
    +.
    -0.53
     respectively
    -0.52
     destro
    -0.51
    anwhile
    -0.50
    arthed
    -0.50
    etheless
    -0.49
    hetti
    -0.49
    arnaev
    -0.47
     rall
    -0.46
    POSITIVE LOGITS
    ,"
    1.02
    %"
    1.00
    ")
    0.99
    ,'"
    0.95
    "]
    0.94
    "—
    0.94
    .")
    0.94
    "),
    0.92
     [
    0.89
     ..."
    0.89
    Act Density 0.867%

    No Known Activations