INDEX
    Explanations

    punctuations and symbols, particularly parentheses

    New Auto-Interp
    Negative Logits
    ebi
    -0.16
    hower
    -0.15
     powers
    -0.15
    andro
    -0.15
    odyn
    -0.14
    oux
    -0.14
     Briggs
    -0.14
    licos
    -0.14
    GW
    -0.14
    ech
    -0.14
    POSITIVE LOGITS
     Bolt
    0.15
    drv
    0.15
    ribbon
    0.14
    mia
    0.14
    Telegram
    0.14
    _tF
    0.14
    cep
    0.14
    amik
    0.14
    ÄĻd
    0.14
    du
    0.14
    Act Density 0.003%

    No Known Activations