INDEX
Explanations
punctuation marks and their frequency in the text
Tokens after commas that introduce contrasting information
contrast and continuation
New Auto-Interp
Negative Logits
,
-0.94
…
-0.76
「
-0.73
...
-0.69
?
-0.64
.
-0.63
:
-0.60
『
-0.59
'
-0.59
/
-0.55
POSITIVE LOGITS
etc
1.16
however
1.11
}}$,
1.07
albeit
0.99
namely
0.91
including
0.90
which
0.89
yaitu
0.89
=,
0.89
though
0.88
Activations Density 2.749%