INDEX
Explanations
survival
The neuron fires on words that signal cooperative effort toward survival (e.g. “work,” “together,” “survive”).
New Auto-Interp
Negative Logits
dese
-0.08
loc
-0.07
ordes
-0.07
igen
-0.07
tion
-0.07
.gt
-0.07
виступ
-0.06
_material
-0.06
isd
-0.06
produit
-0.06
POSITIVE LOGITS
дат
0.06
(\'
0.06
Observable
0.06
_finish
0.06
арти
0.06
Leo
0.06
onestly
0.06
LEN
0.06
"..
0.06
_totals
0.05
Activations Density 0.031%