INDEX
    Explanations

    phrases related to medical experiments and treatments

    references to placebos and their effects in experimental contexts

    New Auto-Interp
    Negative Logits
    eways
    -0.73
    hani
    -0.72
    laws
    -0.71
    lar
    -0.69
    Allen
    -0.68
     IPM
    -0.68
    clud
    -0.68
    Greg
    -0.67
    sections
    -0.67
    dar
    -0.66
    POSITIVE LOGITS
     placebo
    1.12
    veyard
    0.90
     analges
    0.70
     aspirin
    0.70
     Downs
    0.68
     mosqu
    0.67
     baseline
    0.67
     conclud
    0.66
     augment
    0.65
    ength
    0.65
    Act Density 0.010%

    No Known Activations