INDEX
Explanations
phrases related to medical studies or clinical trials
New Auto-Interp
Negative Logits
_OK
-0.14
lund
-0.14
alta
-0.14
slug
-0.14
Incoming
-0.14
ews
-0.14
Dam
-0.13
crafts
-0.13
benh
-0.13
BackPressed
-0.13
POSITIVE LOGITS
safety
0.19
Safety
0.19
-arm
0.17
usercontent
0.17
afety
0.16
Safety
0.16
intent
0.15
elage
0.15
Arm
0.15
sponsor
0.15
Activations Density 0.014%