INDEX
    Explanations

    references to violence in video games

    New Auto-Interp
    Negative Logits
    addock
    -0.17
    wnd
    -0.16
    ivre
    -0.16
     dagger
    -0.15
     chaud
    -0.15
     Deck
    -0.14
     scal
    -0.14
    çĭ¼
    -0.13
     spider
    -0.13
     Howe
    -0.13
    POSITIVE LOGITS
     Mario
    0.22
     Luigi
    0.21
    Mario
    0.21
     Mushroom
    0.21
     SMB
    0.20
     Yoshi
    0.19
     plumber
    0.18
    kart
    0.18
    platform
    0.18
    _trampoline
    0.17
    Act Density 0.019%

    No Known Activations