INDEX
    Explanations

    quotation marks and their associated punctuation

    New Auto-Interp
    Negative Logits
    '));
    
    -0.93
    }');
    -0.93
    %");
    -0.88
    ...');
    -0.88
    ]');
    -0.86
    _
    
    -0.84
    ...
    
    -0.81
    )');
    -0.80
    \\
    
    -0.79
    }';
    -0.79
    POSITIVE LOGITS
     “
    1.98
    1.97
     "
    1.57
     ‘
    1.49
    (“
    1.43
    ("
    1.42
    =”
    1.41
    ,“
    1.40
    、“
    1.32
    =“
    1.30
    Act Density 0.292%

    No Known Activations