INDEX
Explanations
references to violent events and individuals involved in those incidents
New Auto-Interp
Negative Logits
ittle
-0.14
uggest
-0.14
lds
-0.14
ahas
-0.14
ugg
-0.14
ÏģÏī
-0.14
onde
-0.14
leak
-0.13
arena
-0.13
_given
-0.13
POSITIVE LOGITS
matching
0.25
dressed
0.21
believed
0.20
aged
0.20
Matching
0.19
fitting
0.19
matching
0.18
who
0.18
whom
0.18
Matching
0.17
Activations Density 0.069%