INDEX
    Explanations

    the presence of the letter 'a' in various contexts within the text

    New Auto-Interp
    Negative Logits
    f
    -0.23
    g
    -0.22
    v
    -0.22
    d
    -0.22
    r
    -0.21
    y
    -0.20
    c
    -0.18
    ver
    -0.18
    h
    -0.18
    vier
    -0.18
    POSITIVE LOGITS
    abb
    0.30
    ,b
    0.29
    /b
    0.25
    +b
    0.23
    ustin
    0.22
    eron
    0.21
     href
    0.21
    -zA
    0.20
    >b
    0.20
     aVar
    0.19
    Act Density 0.079%

    No Known Activations