INDEX
    Explanations

    references to violence and serious societal issues

    New Auto-Interp
    Negative Logits
    chin
    -0.16
     ghosts
    -0.14
    éric
    -0.14
    ichel
    -0.14
    china
    -0.14
    öt
    -0.14
    chw
    -0.14
    kea
    -0.14
    verage
    -0.14
    anager
    -0.14
    POSITIVE LOGITS
     hide
    0.43
     hor
    0.42
    hor
    0.36
    hide
    0.34
     rep
    0.31
     Hide
    0.30
     ab
    0.29
    Hor
    0.29
     gh
    0.28
     sick
    0.28
    Act Density 0.380%

    No Known Activations