INDEX
    Explanations

    interpolate

    The neuron fires on occurrences of “interpolation” (and its close derivatives, e.g. “interpolate,” “interpolating,” even “extrapolation”), essentially spotting the “polat” subword in those terms.

    New Auto-Interp
    Negative Logits
     Raymond
    -0.07
     Walter
    -0.07
     catalogue
    -0.07
    [W
    -0.07
     cause
    -0.07
     except
    -0.06
     waste
    -0.06
     sight
    -0.06
     지정
    -0.06
     inj
    -0.06
    POSITIVE LOGITS
    Interpolator
    0.08
     interpolation
    0.08
     interpolated
    0.08
    polate
    0.08
     interpol
    0.07
    .interpolate
    0.07
     interpolate
    0.07
    _strings
    0.07
    polation
    0.07
    0.07
    Act Density 0.002%

    No Known Activations