INDEX
Explanations
negative expressions and themes related to social challenges and issues
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.05
3:0.05
4:0.04
5:0.03
6:0.33
7:0.20
8:0.04
9:0.06
10:0.05
11:0.04
Negative Logits
istries
-1.32
:{-1.32
�
-1.29
Horizons
-1.29
destiny
-1.25
privileges
-1.24
royalties
-1.24
weights
-1.20
Membership
-1.19
props
-1.19
POSITIVE LOGITS
imensional
1.57
gae
1.54
compr
1.53
ranged
1.46
aretz
1.44
sidx
1.44
umn
1.44
姫
1.43
immer
1.42
auc
1.37
Activations Density 0.001%