INDEX
    Explanations

    instances of parentheses and their associated content

    New Auto-Interp
    Negative Logits
    æıı
    -0.15
     Boyle
    -0.14
    éĨ
    -0.14
    uelle
    -0.14
    ANTA
    -0.14
    umas
    -0.14
    æĪ
    -0.14
     Butter
    -0.13
     README
    -0.13
    Spy
    -0.13
    POSITIVE LOGITS
     literal
    0.66
     literally
    0.63
     Liter
    0.60
    liter
    0.57
     pun
    0.56
    pun
    0.53
    literal
    0.50
     figur
    0.50
     Literal
    0.50
    Liter
    0.48
    Act Density 0.108%

    No Known Activations