INDEX
Explanations
contractions and possessive forms in the text
New Auto-Interp
Negative Logits
“
-1.06
”
-1.05
’
-1.02
‘
-1.01
”,
-0.97
?”
-0.96
’,
-0.96
=”
-0.96
.”
-0.93
”.
-0.92
POSITIVE LOGITS
houſe
1.16
Houſe
1.08
Efq
1.05
ſtate
1.04
ftate
1.00
purpoſe
0.99
poffe
0.98
greateſt
0.98
་་
0.96
ſtand
0.96
Activations Density 0.104%