INDEX
    Explanations

    This neuron activates on the question word “how.”

    New Auto-Interp
    Negative Logits
    ,False
    -0.07
     sentence
    -0.07
    -driver
    -0.07
     sees
    -0.06
     descricao
    -0.06
    []{↵
    -0.06
    notation
    -0.06
    _was
    -0.06
    .Registry
    -0.06
     десят
    -0.06
    POSITIVE LOGITS
     how
    0.09
    How
    0.08
     How
    0.07
     assignable
    0.07
     أب
    0.06
     propor
    0.06
    .esp
    0.06
     возмож
    0.06
     Ease
    0.06
     امکان
    0.06
    Act Density 0.042%

    No Known Activations