INDEX
Explanations
words that people frequently use when chatting or being interviewed.
Quotation marks
New Auto-Interp
Negative Logits
'
-3.13
'
-1.87
'.
-1.63
$'
-1.62
)'
-1.55
.'
-1.54
'"
-1.52
\'
-1.49
}'
-1.45
'...
-1.45
POSITIVE LOGITS
”)
1.05
”).
1.02
.”)
0.98
”),
0.97
.”
0.97
?”
0.91
”.
0.91
,”
0.90
”,
0.88
”]
0.85
Activations Density 54.927%