INDEX
Explanations
expressions of opinion or commentary on social issues
New Auto-Interp
Negative Logits
interpreted
-0.84
nonexistent
-0.79
perce
-0.78
suspected
-0.78
sensitive
-0.78
perceived
-0.77
inclusion
-0.76
analys
-0.76
arbit
-0.76
subt
-0.76
POSITIVE LOGITS
Advertisements
1.48
END
1.40
Conclusion
1.38
********************************
1.36
________________________________
1.36
________________________
1.34
Thank
1.28
EDIT
1.27
References
1.25
Thanks
1.25
Activations Density 0.355%