INDEX
Explanations
instances of various types of conflict or confrontation
New Auto-Interp
Head Attr Weights
0:0.17
1:0.07
2:0.04
3:0.09
4:0.02
5:0.02
6:0.03
7:0.01
8:0.16
9:0.04
10:0.08
11:0.21
Negative Logits
discontin
-1.71
Lau
-1.57
favorably
-1.45
coordin
-1.45
referen
-1.44
Lans
-1.43
Monte
-1.41
internationally
-1.38
numer
-1.37
Pau
-1.36
POSITIVE LOGITS
hog
1.93
Osw
1.82
bilt
1.78
avage
1.74
@@
1.72
aceous
1.72
Hob
1.61
Cf
1.56
<|endoftext|>
1.55
WHERE
1.55
Activations Density 0.006%