INDEX
Explanations
references to specific studies and their outcomes
New Auto-Interp
Negative Logits
civilians
-0.50
Mangel
-0.47
adra
-0.47
Johansen
-0.45
bombing
-0.44
Kies
-0.44
FontStyle
-0.44
Leinwand
-0.44
CCC
-0.43
ddd
-0.43
POSITIVE LOGITS
CreateTagHelper
0.56
PathVariable
0.56
edges
0.55
outcome
0.54
fate
0.52
DockStyle
0.47
Outcome
0.46
__':
0.46
outcome
0.45
edge
0.44
Activations Density 0.235%