INDEX
Explanations
contexts related to disasters or catastrophic events
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.08
3:0.09
4:0.16
5:0.03
6:0.08
7:0.30
8:0.03
9:0.03
10:0.06
11:0.05
Negative Logits
eret
-1.97
chnology
-1.80
iera
-1.76
aiden
-1.74
older
-1.73
ibrary
-1.66
isSpecialOrderable
-1.66
essional
-1.64
版
-1.63
escription
-1.62
POSITIVE LOGITS
havoc
1.87
vib
1.80
glitches
1.74
inequ
1.69
boredom
1.66
Rebell
1.63
contradictions
1.61
rud
1.57
disappoint
1.56
jealousy
1.48
Activations Density 0.000%