INDEX
Explanations
details related to physical harm or injury
graphic descriptions of violence and injury
New Auto-Interp
Negative Logits
ivals
-0.91
agents
-0.82
Families
-0.75
Income
-0.70
Reward
-0.70
actionDate
-0.69
Squadron
-0.68
edia
-0.67
Franch
-0.67
enario
-0.65
POSITIVE LOGITS
protr
1.11
amput
1.09
numb
1.06
swollen
1.00
shaved
0.96
throb
0.96
abnorm
0.93
clenched
0.84
dors
0.84
thickness
0.83
Activations Density 0.387%