INDEX
Explanations
words related to politics and government affairs
terms related to negative implications and consequences, especially in the context of reputation and moral judgment
New Auto-Interp
Head Attr Weights
0:0.06
1:0.03
2:0.43
3:0.06
4:0.11
5:0.05
6:0.01
7:0.02
8:0.06
9:0.05
10:0.05
11:0.02
Negative Logits
uddin
-1.38
pockets
-1.24
querque
-1.22
umbai
-1.21
pond
-1.17
sophistication
-1.17
oha
-1.13
QB
-1.11
ibrary
-1.10
arius
-1.10
POSITIVE LOGITS
ctive
1.50
ction
1.44
lishes
1.42
ciation
1.41
ォ
1.38
lished
1.36
issance
1.32
elected
1.23
untled
1.19
ancing
1.18
Activations Density 0.012%