INDEX
    Explanations

    occurrences of the word "og"

    New Auto-Interp
    Negative Logits
    y
    -0.24
    o
    -0.19
    g
    -0.18
    yb
    -0.17
    nell
    -0.16
    nelle
    -0.15
    yne
    -0.15
    sip
    -0.15
    eton
    -0.15
    s
    -0.15
    POSITIVE LOGITS
    ues
    0.30
    lio
    0.26
    ei
    0.25
    ging
    0.25
    gers
    0.24
    eo
    0.24
    gy
    0.23
    ey
    0.23
    lu
    0.22
    getto
    0.21
    Act Density 0.023%

    No Known Activations