INDEX
Explanations
single or paired quotation marks and apostrophes
New Auto-Interp
Negative Logits
“
-0.80
""),
-0.77
""))
-0.74
"
-0.73
"").
-0.73
"/")
-0.71
”
-0.65
"",
-0.65
"—
-0.62
"":
-0.62
POSITIVE LOGITS
'
2.71
‘
2.47
‚
1.76
-'
1.70
('1.70
’
1.69
(‘
1.69
='
1.54
`
1.53
『
1.47
Activations Density 0.139%