INDEX
Explanations
statements related to storytelling or narrative context
New Auto-Interp
Negative Logits
,
-0.98
「
-0.80
…
-0.79
...
-0.75
?
-0.66
/
-0.65
『
-0.64
:
-0.64
"
-0.64
."—
-0.63
POSITIVE LOGITS
etc
1.39
however
1.23
but
1.19
albeit
1.18
including
1.12
which
1.12
namely
1.09
}}$,
1.09
especially
1.01
although
0.98
Activations Density 2.504%