INDEX
Explanations
positive references to societal advancements or improvements
New Auto-Interp
Head Attr Weights
0:0.09
1:0.05
2:0.08
3:0.08
4:0.08
5:0.08
6:0.06
7:0.07
8:0.10
9:0.09
10:0.08
11:0.08
Negative Logits
reviews
-2.10
Review
-2.01
review
-1.88
Judges
-1.81
evaluations
-1.77
CoC
-1.69
recommendations
-1.66
opin
-1.65
Nun
-1.64
Revised
-1.63
POSITIVE LOGITS
rising
2.31
ortun
1.96
ripp
1.86
onte
1.77
ATER
1.72
whe
1.70
ibrary
1.67
Soon
1.63
river
1.61
ère
1.58
Activations Density 0.000%