INDEX
    Explanations

    punctuation, particularly parentheses and brackets

    New Auto-Interp
    Negative Logits
    𝐝
    -0.64
     myth
    -0.61
    𝐮
    -0.60
     Gla
    -0.59
     Hamb
    -0.57
     the
    -0.56
    𝐡
    -0.55
     Chit
    -0.54
     Jop
    -0.54
    ad
    -0.54
    POSITIVE LOGITS
    })).
    1.35
    __).
    1.32
    ()).
    1.29
    ))).
    1.28
    ])).
    1.28
    expandindo
    1.27
    }`).
    1.24
    )).
    1.24
    ")).
    1.20
    ').
    1.20
    Act Density 0.062%

    No Known Activations