INDEX
Explanations
content that denotes negativity or disapproval
New Auto-Interp
Negative Logits
Fiscal
-0.80
ND
-0.78
Prob
-0.77
MEN
-0.73
Monetary
-0.73
Expend
-0.73
Feder
-0.72
Confederation
-0.71
Emirates
-0.70
Soc
-0.69
POSITIVE LOGITS
inducing
1.54
esque
1.52
themed
1.51
style
1.50
shaped
1.47
like
1.46
inspired
1.45
covered
1.43
colored
1.37
filled
1.36
Activations Density 0.062%