INDEX
    Explanations

    The neuron is specifically detecting occurrences of the word “premise.”

    New Auto-Interp
    Negative Logits
     сообщ
    -0.07
     Джон
    -0.07
     voll
    -0.07
    lerdir
    -0.07
    ัฒนา
    -0.06
     stagger
    -0.06
     almış
    -0.06
     والن
    -0.06
    .onPause
    -0.06
     harmon
    -0.06
    POSITIVE LOGITS
     premise
    0.08
    _stmt
    0.06
    amet
    0.06
     base
    0.06
     concept
    0.06
    _ENDPOINT
    0.06
     mey
    0.06
     선수
    0.06
    feeding
    0.06
    base
    0.06
    Act Density 0.004%

    No Known Activations