INDEX
Explanations
references to violence in video games
New Auto-Interp
Negative Logits
addock
-0.17
wnd
-0.16
ivre
-0.16
dagger
-0.15
chaud
-0.15
Deck
-0.14
scal
-0.14
çĭ¼
-0.13
spider
-0.13
Howe
-0.13
POSITIVE LOGITS
Mario
0.22
Luigi
0.21
Mario
0.21
Mushroom
0.21
SMB
0.20
Yoshi
0.19
plumber
0.18
kart
0.18
platform
0.18
_trampoline
0.17
Activations Density 0.019%