INDEX
Explanations
phrases related to seeking help or intervention in a crisis
statements emphasizing change or improvement in circumstances
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.61
+.
-0.53
respectively
-0.52
destro
-0.51
anwhile
-0.50
arthed
-0.50
etheless
-0.49
hetti
-0.49
arnaev
-0.47
rall
-0.46
POSITIVE LOGITS
,"
1.02
%"
1.00
")
0.99
,'"
0.95
"]
0.94
"—
0.94
.")
0.94
"),
0.92
[
0.89
..."
0.89
Activations Density 0.867%