INDEX
    Explanations

    verbs or phrases related to causing harm or suffering

    words related to causing harm or suffering

    New Auto-Interp
    Negative Logits
    runner
    -0.82
    cube
    -0.78
    cius
    -0.77
    wagen
    -0.75
     Ou
    -0.71
    kj
    -0.70
    chrom
    -0.69
     Blackwell
    -0.69
    mom
    -0.68
    zo
    -0.67
    POSITIVE LOGITS
     inflicted
    1.32
     inflict
    1.04
     inflicting
    1.03
     veter
    1.00
     inflic
    0.97
     adolesc
    0.96
    hesda
    0.89
    terness
    0.88
     wounds
    0.86
     eleph
    0.84
    Act Density 0.012%

    No Known Activations