INDEX
    Explanations

    references to harm in various contexts

    New Auto-Interp
    Negative Logits
    WebElementEntity
    -0.78
     Leighton
    -0.77
    Tach
    -0.77
    \}\\
    -0.74
     Dowling
    -0.73
     gynhyrchwyd
    -0.71
    例文帳に追加
    -0.71
    element
    -0.70
     Dresden
    -0.69
    "]);
    
    -0.69
    POSITIVE LOGITS
     harm
    1.23
    harm
    1.23
     Harm
    1.17
    Harm
    1.16
     harms
    1.11
     Harms
    1.09
    Harmful
    1.05
     Hurt
    1.02
     harmed
    0.91
    Hurt
    0.88
    Act Density 0.008%

    No Known Activations