INDEX
Explanations
phrases indicating health issues and health-related discussions
New Auto-Interp
Negative Logits
âĢŀ
-0.29
(“
-0.25
“â̦
-0.23
“
-0.22
“[
-0.19
``
-0.19
=”
-0.19
ãĢĮ
-0.18
)'↵
-0.17
,“
-0.17
POSITIVE LOGITS
()"↵
0.18
"↵↵
0.17
'
0.16
âĢº
0.15
()"
0.15
"></
0.15
/fw
0.15
`
0.15
UTE
0.15
]"
0.15
Activations Density 0.524%