INDEX
Explanations
descriptions of injuries and their effects
New Auto-Interp
Negative Logits
Agencies
-0.15
itu
-0.14
oxygen
-0.14
dust
-0.14
Grow
-0.14
unan
-0.14
Fires
-0.14
aln
-0.13
agencies
-0.13
ilian
-0.13
POSITIVE LOGITS
spill
0.28
dri
0.26
mess
0.26
spills
0.24
pooling
0.24
pudd
0.23
pooled
0.22
overflow
0.22
spilled
0.22
mess
0.22
Activations Density 0.109%