INDEX
Explanations
This neuron detects occurrences of the contraction “don’t” (i.e. the word “don’t”).
New Auto-Interp
Negative Logits
(A
-0.08
recognized
-0.07
(*((
-0.07
A
-0.06
Judicial
-0.06
sea
-0.06
temple
-0.06
sums
-0.06
seaw
-0.06
reproduce
-0.06
POSITIVE LOGITS
Dont
0.09
dont
0.08
dont
0.08
productName
0.08
ont
0.08
't
0.07
’t
0.07
έχουν
0.07
นคร
0.07
俺
0.07
Activations Density 0.023%