INDEX
Explanations
The neuron consistently activates on the word “never,” especially in first-person statements about something not done before.
New Auto-Interp
Negative Logits
RECEIVE
-0.07
ticker
-0.07
Earn
-0.07
quiz
-0.06
_parsed
-0.06
Become
-0.06
repair
-0.06
earned
-0.06
earns
-0.06
منزل
-0.06
POSITIVE LOGITS
pest
0.06
Separator
0.06
Ú
0.06
Ju
0.06
_Renderer
0.06
startDate
0.06
非
0.06
/non
0.06
hran
0.06
noen
0.06
Activations Density 0.012%