INDEX
    Explanations

    The neuron is primarily detecting occurrences of the word “comfortable.”

    New Auto-Interp
    Negative Logits
    ショ
    -0.07
    об
    -0.07
    ленно
    -0.06
     byli
    -0.06
    集中
    -0.06
    协议
    -0.06
    _mat
    -0.06
     decoder
    -0.06
     Stick
    -0.06
     tor
    -0.06
    POSITIVE LOGITS
     sucked
    0.07
    .concatenate
    0.07
    "](
    0.07
    .optimizer
    0.06
     extracting
    0.06
    lus
    0.06
                      
    0.06
     Carolina
    0.06
     limp
    0.06
     funded
    0.06
    Act Density 0.007%

    No Known Activations