INDEX
Explanations
references to power dynamics and gender perceptions in social contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.12
2:0.03
3:0.03
4:0.05
5:0.12
6:0.02
7:0.03
8:0.10
9:0.32
10:0.05
11:0.04
Negative Logits
continue
-2.52
undertake
-2.47
acquire
-2.46
occupy
-2.46
ustomed
-2.46
fare
-2.45
venture
-2.39
liberate
-2.34
carve
-2.34
earn
-2.33
POSITIVE LOGITS
ifies
3.21
communicates
3.09
inates
3.00
its
2.99
pires
2.99
ㅋㅋ
2.95
doesn
2.87
acters
2.81
applies
2.79
"""
2.77
Activations Density 0.043%