INDEX
Explanations
expressions of frustration or disappointment
Responses to questions/statements
New Auto-Interp
Negative Logits
]),
-1.13
".
-1.05
'),
-1.00
"),
-1.00
")");
-0.97
%");
-0.97
"},
-0.96
")));
-0.96
$_"
-0.96
"],
-0.94
POSITIVE LOGITS
yeah
0.98
maybe
0.94
...
0.92
sorry
0.92
Maybe
0.90
haha
0.90
....
0.89
:)
0.87
gonna
0.86
you
0.86
Activations Density 0.325%