INDEX
Explanations
negative constructs or phrases related to scrutiny and lack of accountability in various contexts
New Auto-Interp
Head Attr Weights
0:0.06
1:0.01
2:0.19
3:0.20
4:0.12
5:0.03
6:0.06
7:0.03
8:0.10
9:0.03
10:0.05
11:0.07
Negative Logits
anonym
-1.46
icing
-1.43
ynchron
-1.39
backlog
-1.38
tumblr
-1.38
mash
-1.31
imagining
-1.29
slur
-1.27
vaguely
-1.27
ranging
-1.25
POSITIVE LOGITS
nor
3.54
nor
2.35
anymore
2.20
inventoryQuantity
1.97
irlf
1.81
Nor
1.75
Nor
1.74
omsky
1.65
ught
1.64
ridges
1.56
Activations Density 0.019%