INDEX
    Explanations

    references to the word "Ba" followed by various numerical identifiers or context descriptors

    New Auto-Interp
    Negative Logits
    pants
    -0.79
    lessly
    -0.76
    wise
    -0.74
    otle
    -0.68
     address
    -0.67
    REDACTED
    -0.67
    ments
    -0.64
    dress
    -0.60
    weed
    -0.60
     Leone
    -0.60
    POSITIVE LOGITS
    uble
    1.20
    uman
    1.14
    iley
    1.12
    umann
    1.11
    plin
    1.07
    iting
    1.03
    atar
    1.01
    ÅŁ
    0.99
    ñ
    0.99
    um
    0.97
    Act Density 0.026%

    No Known Activations