INDEX
    Explanations

    unique characters or symbols in text

    New Auto-Interp
    Negative Logits
     ('
    -0.43
    -0.38
     (
    -0.35
     '
    -0.33
     (~
    -0.30
     («
    -0.29
    -0.28
     ("
    -0.28
     '[
    -0.28
     '(
    -0.26
    POSITIVE LOGITS
    -----↵
    0.27
    ----↵
    0.26
    ——
    0.25
    -↵
    0.23
    —I
    0.23
    --↵
    0.23
    —↵↵
    0.22
     —↵
    0.21
    ---↵
    0.20
    ------↵
    0.19
    Act Density 0.011%

    No Known Activations