INDEX
Explanations
phrases related to medical conditions and their implications
New Auto-Interp
Negative Logits
threatened
-0.15
matching
-0.15
needed
-0.14
tainted
-0.14
805
-0.14
_seconds
-0.14
disturbed
-0.14
620
-0.14
adverse
-0.13
eker
-0.13
POSITIVE LOGITS
caused
0.22
treat
0.21
prevent
0.20
CAUSED
0.17
incur
0.17
prevent
0.17
ama
0.17
cured
0.17
reversible
0.16
worse
0.16
Activations Density 0.255%