INDEX
Explanations
This neuron specifically detects mentions of the word “life.”
New Auto-Interp
Negative Logits
Tex
-0.07
혀
-0.07
-fashion
-0.06
Razor
-0.06
umann
-0.06
YELLOW
-0.06
-picture
-0.06
(render
-0.06
(force
-0.06
emplo
-0.06
POSITIVE LOGITS
lives
0.06
life
0.06
Lands
0.06
xOffset
0.06
만들
0.06
heartfelt
0.06
诗
0.06
旦
0.06
simplify
0.06
들에게
0.06
Activations Density 0.024%