INDEX
Explanations
phrases related to social influence and community responsibilities
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.26
3:0.21
4:0.10
5:0.03
6:0.03
7:0.06
8:0.05
9:0.04
10:0.08
11:0.05
Negative Logits
ril
-1.40
coord
-1.32
pex
-1.31
裏
-1.26
機
-1.26
artments
-1.24
gio
-1.22
reau
-1.22
ede
-1.18
rote
-1.18
POSITIVE LOGITS
200000
1.46
sake
1.44
please
1.33
perenn
1.31
"],"
1.30
adventurous
1.27
uctions
1.24
please
1.23
aspiring
1.22
enduring
1.21
Activations Density 0.147%