INDEX
    Explanations

    references to the word "lamb"

    New Auto-Interp
    Negative Logits
    pector
    -0.16
    yles
    -0.16
    nist
    -0.15
    едеÑĢа
    -0.15
    IFE
    -0.15
     increment
    -0.15
    zb
    -0.15
    .dex
    -0.15
    uder
    -0.14
    Incre
    -0.14
    POSITIVE LOGITS
    orghini
    0.41
    recht
    0.29
    orgh
    0.27
    erti
    0.27
    chop
    0.27
    das
    0.26
    ret
    0.25
    erts
    0.25
    eth
    0.25
    ertz
    0.25
    Act Density 0.007%

    No Known Activations