INDEX
Explanations
phrases related to predictions, growth, and factual statements
statements expressing predictions or assessments of potential outcomes
New Auto-Interp
Negative Logits
?).
-0.73
.).
-0.70
).
-0.61
+.
-0.59
.)
-0.56
arthed
-0.56
).
-0.56
().
-0.56
.(
-0.55
%.
-0.55
POSITIVE LOGITS
,"
1.10
[
1.07
%"
1.05
,'"
0.96
.,"
0.92
â̦"
0.87
":
0.86
),"
0.85
,''
0.84
"
0.83
Activations Density 1.092%