INDEX
Explanations
events related to train incidents and their consequences
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.05
3:0.32
4:0.09
5:0.05
6:0.07
7:0.05
8:0.05
9:0.05
10:0.12
11:0.06
Negative Logits
}.
-2.27
.''
-2.18
iability
-2.17
customization
-2.08
alogy
-2.07
};
-2.05
ilitarian
-2.05
‑
-2.03
ingo
-2.01
Footnote
-1.98
POSITIVE LOGITS
ASHINGTON
3.28
IMAGES
2.67
ccording
2.63
WASHINGTON
2.28
Posted
2.26
Updated
2.23
NEW
2.13
ANGEL
2.11
LOS
2.07
headlines
2.06
Activations Density 0.297%