INDEX
Explanations
phrases indicating health-related actions and conditions, particularly involving medical attention and compliance
New Auto-Interp
Negative Logits
EITHER
-0.17
oved
-0.16
etrics
-0.14
Outlet
-0.14
anness
-0.14
á»ķ
-0.14
.mods
-0.14
either
-0.14
anes
-0.14
anth
-0.14
POSITIVE LOGITS
overall
0.18
other
0.17
erna
0.15
Overall
0.15
ones
0.14
Lair
0.14
isoner
0.14
plets
0.14
heck
0.14
udo
0.14
Activations Density 0.137%