INDEX
Explanations
content related to socio-political issues and historical events in China
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.11
3:0.47
4:0.07
5:0.06
6:0.02
7:0.06
8:0.03
9:0.05
10:0.01
11:0.02
Negative Logits
-6.66
[…]
-6.62
″
-5.80
—
-5.58
…
-5.31
…]
-4.89
�
-4.68
‐
-4.68
🙂
-4.63
…"
-4.62
POSITIVE LOGITS
--
11.35
!--
8.70
``
8.64
)--
8.14
``
7.54
.--
7.29
---
7.17
----
7.02
-->
5.55
----------
5.54
Activations Density 0.093%