INDEX
    Explanations

    mentions of a specific keyword "Hel" with varying emphasis indicated by different activation values

    references to health-related topics and organizations

    New Auto-Interp
    Negative Logits
     Saber
    -0.70
     Eag
    -0.67
     selves
    -0.63
     negatives
    -0.63
     surrogate
    -0.60
     Memories
    -0.58
     Lowell
    -0.58
    EED
    -0.57
    rers
    -0.57
    æī
    -0.55
    POSITIVE LOGITS
    pless
    1.24
    mut
    1.20
    ms
    1.17
    ped
    1.09
    ping
    1.07
    mand
    1.04
    ios
    1.02
    iop
    0.99
    met
    0.98
    ps
    0.96
    Act Density 0.038%

    No Known Activations