INDEX
Explanations
phrases related to emotions and personal experiences
expressions of strong emotions or statements of clarity
New Auto-Interp
Negative Logits
REDACTED
-0.63
ãĢij
-0.62
îĢ
-0.60
().
-0.58
shall
-0.57
.*
-0.57
().
-0.56
etheless
-0.54
NOW
-0.54
.(
-0.54
POSITIVE LOGITS
[
0.98
,"
0.93
,'"
0.87
),"
0.86
,''
0.77
.,"
0.75
everybody
0.72
['
0.72
somebody
0.67
,'
0.66
Activations Density 1.242%