INDEX
Explanations
interpolate
The neuron fires on occurrences of “interpolation” (and its close derivatives, e.g. “interpolate,” “interpolating,” even “extrapolation”), essentially spotting the “polat” subword in those terms.
New Auto-Interp
Negative Logits
Raymond
-0.07
Walter
-0.07
catalogue
-0.07
[W
-0.07
cause
-0.07
except
-0.06
waste
-0.06
sight
-0.06
지정
-0.06
inj
-0.06
POSITIVE LOGITS
Interpolator
0.08
interpolation
0.08
interpolated
0.08
polate
0.08
interpol
0.07
.interpolate
0.07
interpolate
0.07
_strings
0.07
polation
0.07
탕
0.07
Activations Density 0.002%