INDEX
Explanations
The neuron specifically detects occurrences of the word “short.”
New Auto-Interp
Negative Logits
Marg
-0.07
more
-0.07
поч
-0.07
(Char
-0.06
incarcerated
-0.06
idlo
-0.06
plá
-0.06
deline
-0.06
dads
-0.06
shedding
-0.06
POSITIVE LOGITS
contempt
0.06
Astronomy
0.06
function
0.06
//'
0.06
Browser
0.06
sp
0.06
EMP
0.06
DIG
0.06
_CC
0.06
CRE
0.06
Activations Density 0.013%