INDEX
    Explanations

    references to harm and physics-related terms

    terms related to harm and its effects

    New Auto-Interp
    Negative Logits
    Ö¼
    -0.87
    onde
    -0.74
     stakes
    -0.73
     toes
    -0.70
     ducks
    -0.67
     Monteneg
    -0.66
     thumbs
    -0.65
     eyed
    -0.65
     craw
    -0.64
     gravel
    -0.64
    POSITIVE LOGITS
    harm
    3.17
    phys
    1.39
     physi
    1.34
     Phys
    1.32
     Harm
    1.28
     pharmac
    1.23
    Phys
    1.20
    alter
    1.13
    aber
    1.07
    ulla
    1.03
    Act Density 0.056%

    No Known Activations