INDEX
Explanations
complexities and contradictions in narratives or arguments
New Auto-Interp
Head Attr Weights
0:0.04
1:0.05
2:0.01
3:0.09
4:0.06
5:0.06
6:0.07
7:0.02
8:0.40
9:0.08
10:0.03
11:0.03
Negative Logits
ldom
-2.28
eday
-2.23
ouble
-2.19
etheless
-2.17
alions
-2.08
aths
-2.06
gemony
-1.88
idth
-1.84
��
-1.84
iden
-1.83
POSITIVE LOGITS
iHUD
1.80
shock
1.71
Omaha
1.69
1.69
HUD
1.61
hay
1.61
FX
1.60
eering
1.57
electric
1.54
Shock
1.53
Activations Density 0.001%