INDEX
    Explanations

    punctuation marks, particularly commas and apostrophes

    New Auto-Interp
    Negative Logits
     parís
    -0.53
     Winkler
    -0.49
     saveiro
    -0.46
    <strong>
    -0.44
    ambilan
    -0.43
    <td>
    -0.42
    -0.42
     Jensen
    -0.41
    க்
    -0.41
    -0.41
    POSITIVE LOGITS
    ",
    
    1.13
    ',
    
    1.10
    >",
    
    1.01
    .",
    
    0.88
    ),
    
    0.81
    (),
    
    0.80
    ]),
    
    0.79
    "),
    
    0.78
    ],
    
    0.77
    \",\
    0.77
    Act Density 0.005%

    No Known Activations