INDEX
Explanations
phrases related to societal or governmental issues
expressions of personal opinions or statements of belief
New Auto-Interp
Negative Logits
.</
-0.71
?).
-0.68
.*
-0.68
.).
-0.62
ayn
-0.60
ãĢĤ
-0.60
arist
-0.59
+.
-0.59
.<
-0.57
aired
-0.56
POSITIVE LOGITS
[
0.99
,"
0.92
,'"
0.79
%"
0.78
initely
0.74
,''
0.73
),"
0.68
incent
0.67
anecd
0.66
.,"
0.65
Activations Density 0.843%