INDEX
    Explanations

    references to "ab" or variations of it

    New Auto-Interp
    Negative Logits
    y
    -0.24
    yk
    -0.21
    i
    -0.21
    yi
    -0.21
    yb
    -0.20
    yre
    -0.19
    ÛĮ
    -0.19
    in
    -0.18
    lein
    -0.17
    uario
    -0.17
    POSITIVE LOGITS
    bing
    0.29
    ilitation
    0.27
    oard
    0.24
    bed
    0.24
    ulous
    0.23
    bling
    0.23
    ba
    0.23
    STRACT
    0.21
    on
    0.21
    ber
    0.21
    Act Density 0.027%

    No Known Activations