INDEX
    Explanations

    adjectives or nouns related to harm or damage

    terms related to the concept of harm or harmfulness

    New Auto-Interp
    Negative Logits
    quart
    -0.79
    Kings
    -0.71
    Hun
    -0.70
    eely
    -0.68
    ARCH
    -0.66
    Whe
    -0.65
    ebus
    -0.65
    peak
    -0.65
    TeX
    -0.64
    gran
    -0.64
    POSITIVE LOGITS
     harmful
    1.15
     harm
    1.03
     undermin
    1.01
     harms
    0.88
     endanger
    0.85
     adolesc
    0.85
     detrimental
    0.85
     consequences
    0.84
     contamin
    0.80
     harming
    0.80
    Act Density 0.008%

    No Known Activations