INDEX
    Explanations

    variations of the word "pr" in different contexts

    New Auto-Interp
    Negative Logits
     portions
    -0.15
    i
    -0.15
     Bri
    -0.14
    rej
    -0.14
    erville
    -0.14
    zon
    -0.14
    unding
    -0.14
    ans
    -0.14
    urb
    -0.14
    oct
    -0.14
    POSITIVE LOGITS
    vi
    0.22
    vo
    0.20
    ilik
    0.19
    ви
    0.18
    va
    0.18
    avo
    0.17
    vim
    0.17
    vu
    0.17
    itom
    0.17
    ve
    0.16
    Act Density 0.002%

    No Known Activations