INDEX
Explanations
murder and deaths of families
New Auto-Interp
Negative Logits
Collision
-0.77
lerde
-0.73
ella
-0.72
нале
-0.71
regions
-0.71
色
-0.71
Joystick
-0.71
囚
-0.70
verti
-0.69
cohorts
-0.69
POSITIVE LOGITS
family
1.14
murder
1.12
murder
1.09
throats
1.05
murders
1.03
massac
0.99
massacre
0.98
execution
0.97
famili
0.96
family
0.94
Activations Density 0.034%