INDEX
Explanations
contractions and possessives
New Auto-Interp
Negative Logits
'
0.89
'
0.75
"
0.67
'-
0.66
'-
0.64
',
0.63
'،
0.63
'-'
0.60
','
0.57
'?
0.55
POSITIVE LOGITS
“‘
0.72
’
0.70
(’
0.68
=’
0.61
(‘
0.61
’’
0.59
)’
0.59
.’
0.57
‘’
0.57
(“
0.56
Activations Density 0.002%