INDEX
Explanations
specific numerical information, such as dates, amounts, or statistics
phrases related to approval and compliance in social contexts
New Auto-Interp
Negative Logits
aturdays
-0.51
<-
-0.49
ãĥ¯ãĥ³
-0.48
alloween
-0.46
ãĥ¼ãĥĨãĤ£
-0.46
Ô
-0.45
Peyton
-0.45
bably
-0.45
_-
-0.45
iPhone
-0.44
POSITIVE LOGITS
)."
2.23
.")
2.20
.""
2.05
."
2.05
."[
2.04
".
2.03
)",
2.02
..."
2.02
.",
2.01
)"
2.00
Activations Density 1.219%