INDEX
    Explanations

    references to harm or damage in various contexts

    New Auto-Interp
    Negative Logits
     Monfieur
    -1.06
     purpoſe
    -0.96
     Anſ
    -0.96
     ſeveral
    -0.96
     itſelf
    -0.95
     Chriftian
    -0.95
    expandindo
    -0.92
     pleaſure
    -0.91
     ſtate
    -0.90
     Majefty
    -0.89
    POSITIVE LOGITS
    stra
    0.73
     Railway
    0.60
    igh
    0.55
    s
    0.54
    z
    0.53
     bou
    0.53
    y
    0.52
    u
    0.51
     wa
    0.51
    yl
    0.51
    Act Density 0.147%

    No Known Activations