INDEX
    Explanations

    phrases indicating health issues and health-related discussions

    New Auto-Interp
    Negative Logits
     âĢŀ
    -0.29
     (“
    -0.25
     “â̦
    -0.23
    -0.22
     “[
    -0.19
     ``
    -0.19
    =”
    -0.19
     ãĢĮ
    -0.18
    )'↵
    -0.17
    ,“
    -0.17
    POSITIVE LOGITS
    ()"↵
    0.18
    "↵↵
    0.17
     '
    0.16
    âĢº
    0.15
    ()"
    0.15
    "></
    0.15
    /fw
    0.15
     `
    0.15
    UTE
    0.15
    ]"
    0.15
    Act Density 0.524%

    No Known Activations