INDEX
Explanations
phrases related to medical experiments and treatments
references to placebos and their effects in experimental contexts
New Auto-Interp
Negative Logits
eways
-0.73
hani
-0.72
laws
-0.71
lar
-0.69
Allen
-0.68
IPM
-0.68
clud
-0.68
Greg
-0.67
sections
-0.67
dar
-0.66
POSITIVE LOGITS
placebo
1.12
veyard
0.90
analges
0.70
aspirin
0.70
Downs
0.68
mosqu
0.67
baseline
0.67
conclud
0.66
augment
0.65
ength
0.65
Activations Density 0.010%