INDEX
    Explanations

    special formatting or markers in the text

    New Auto-Interp
    Negative Logits
     -"
    -1.12
     '"
    -1.09
     '
    -1.02
     -'
    -1.02
    ^(@)
    -1.02
     Mendes
    -1.00
    。"
    -0.98
     Humboldt
    -0.98
    ...'
    -0.97
     "
    -0.96
    POSITIVE LOGITS
    1.53
     “
    1.50
    ,”
    1.46
    1.44
    .”
    1.44
    ?”
    1.42
    1.41
    (“
    1.39
    ”,
    1.36
     (“
    1.34
    Act Density 0.222%

    No Known Activations