INDEX
    Explanations

    capitalized letter "P" and its variations in different contexts

    New Auto-Interp
    Negative Logits
     attent
    -0.15
    inka
    -0.15
    edral
    -0.15
    份
    -0.15
    ournée
    -0.14
    IAS
    -0.14
    aktion
    -0.14
    RESS
    -0.14
    autor
    -0.14
    icina
    -0.14
    POSITIVE LOGITS
    im
    0.20
    agem
    0.20
    aser
    0.19
    ings
    0.18
    ures
    0.18
    aged
    0.18
    iped
    0.18
    burg
    0.17
    atches
    0.16
    cre
    0.16
    Act Density 0.032%

    No Known Activations