INDEX
Explanations
sentences related to personal experiences or opinions
phrases related to emotional states or feelings
New Auto-Interp
Negative Logits
`.
-0.70
..."
-0.67
".[
-0.66
thereto
-0.64
."
-0.61
."[
-0.60
?".
-0.59
>]
-0.59
).[
-0.59
-->
-0.59
POSITIVE LOGITS
Dunham
0.74
Dawkins
0.70
Slate
0.69
Garfield
0.67
Twain
0.65
theorist
0.65
anecdote
0.64
disclaimer
0.63
irony
0.62
VICE
0.62
Activations Density 1.648%