INDEX
    Explanations

    instances of the word "kill" and its variations related to violence

    New Auto-Interp
    Negative Logits
    iland
    -0.18
    onto
    -0.15
    ulled
    -0.15
    /out
    -0.14
    iÃŁ
    -0.14
    /Framework
    -0.14
     Blur
    -0.14
    elect
    -0.14
    oha
    -0.14
    land
    -0.14
    POSITIVE LOGITS
     off
    0.23
    joy
    0.20
     spree
    0.20
    /disable
    0.20
    switch
    0.19
    æĪ
    0.19
    lier
    0.18
    deer
    0.18
    çİ°åľº
    0.17
    ibri
    0.17
    Act Density 0.047%

    No Known Activations