INDEX
Explanations
special formatting or markers in the text
New Auto-Interp
Negative Logits
-"
-1.12
'"
-1.09
'
-1.02
-'
-1.02
^(@)
-1.02
Mendes
-1.00
。"
-0.98
Humboldt
-0.98
...'
-0.97
"
-0.96
POSITIVE LOGITS
”
1.53
“
1.50
,”
1.46
’
1.44
.”
1.44
?”
1.42
“
1.41
(“
1.39
”,
1.36
(“
1.34
Activations Density 0.222%