INDEX
Explanations
phrases containing references to transparency and consumer rights
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.11
3:0.05
4:0.29
5:0.02
6:0.10
7:0.13
8:0.03
9:0.03
10:0.11
11:0.05
Negative Logits
�
-1.61
opian
-1.59
zar
-1.57
�
-1.46
ь
-1.43
Lup
-1.40
STAR
-1.39
alien
-1.39
Biological
-1.38
Rus
-1.37
POSITIVE LOGITS
challengers
1.69
contestants
1.65
disclaim
1.64
dominates
1.59
vendors
1.59
usra
1.56
mounted
1.54
vulner
1.53
challeng
1.50
flanked
1.49
Activations Density 0.001%