INDEX
Explanations
statements related to current events, politics, and policy
positive sentiments and expressions of approval related to governance or policies
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-1.02
etheless
-0.88
quished
-0.73
.).
-0.72
ãĢĤ
-0.71
.(
-0.70
%.
-0.68
?).
-0.67
().
-0.66
Annotations
-0.66
POSITIVE LOGITS
,'"
1.48
,"
1.45
[
1.28
),"
1.27
,''
1.20
.,"
1.06
',"
1.05
,'
1.05
['
1.05
%"
1.05
Activations Density 1.399%