INDEX
Explanations
critical perspectives on social and political issues
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.16
3:0.12
4:0.14
5:0.02
6:0.02
7:0.13
8:0.04
9:0.05
10:0.13
11:0.09
Negative Logits
unloaded
-1.45
achev
-1.33
abus
-1.29
hump
-1.28
arted
-1.23
destiny
-1.23
hillary
-1.22
root
-1.19
iem
-1.19
0000000000000000
-1.18
POSITIVE LOGITS
nowadays
1.73
姫
1.67
adays
1.45
orn
1.43
Occupations
1.34
cially
1.32
since
1.31
ORPG
1.28
Uncommon
1.27
sers
1.27
Activations Density 0.193%