INDEX
Explanations
text related to action or tasks
statements expressing opinions or beliefs
New Auto-Interp
Negative Logits
respectively
-0.76
+.
-0.72
.).
-0.68
anwhile
-0.67
ordes
-0.61
ãĤ´ãĥ³
-0.60
).
-0.59
arthed
-0.58
iverpool
-0.57
?).
-0.57
POSITIVE LOGITS
[
1.18
â̦"
1.17
%"
1.15
..."
1.08
,"
1.00
['
0.99
,''
0.92
,'"
0.88
.,"
0.88
)"
0.88
Activations Density 2.003%