INDEX
Explanations
phrases related to concepts of power, agency, and control in social contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.15
3:0.14
4:0.21
5:0.02
6:0.04
7:0.18
8:0.03
9:0.04
10:0.04
11:0.07
Negative Logits
────────
-1.58
RANT
-1.55
apter
-1.48
王
-1.43
foreseen
-1.40
ISTER
-1.37
�
-1.37
�
-1.32
�
-1.32
田
-1.30
POSITIVE LOGITS
Madonna
1.35
Pik
1.24
Beatles
1.22
fiction
1.20
Medal
1.20
lurking
1.18
Clintons
1.17
Milton
1.17
entr
1.17
nih
1.15
Activations Density 0.053%