INDEX
Explanations
references to violent incidents and deaths
New Auto-Interp
Negative Logits
znam
-0.14
Chip
-0.14
cogn
-0.13
Caught
-0.13
062
-0.13
é¥
-0.13
scratch
-0.13
earn
-0.13
reck
-0.13
dio
-0.12
POSITIVE LOGITS
bodies
0.23
discovery
0.22
found
0.21
discovered
0.21
FOUND
0.19
decomposition
0.19
Found
0.19
body
0.19
gefunden
0.18
found
0.18
Activations Density 0.061%