INDEX
    Explanations

    which, który, который

    New Auto-Interp
    Negative Logits
    ividual
    0.75
    airement
    0.74
    iscuit
    0.71
    nent
    0.70
    steak
    0.66
    nue
    0.65
     entera
    0.65
    cap
    0.64
     saja
    0.64
    openia
    0.64
    POSITIVE LOGITS
     که
    1.14
     jotka
    1.08
     които
    1.07
     которые
    1.06
     który
    1.06
     which
    1.02
     который
    0.99
     jonka
    0.98
     who
    0.96
     które
    0.96
    Act Density 0.035%

    No Known Activations