INDEX
Explanations
I'm sorry, but I couldn't identify a specific pattern or theme based on the activations provided for Neuron 4
mentions of the location "Burns" and associated entities
New Auto-Interp
Negative Logits
eanor
-0.80
iour
-0.75
rium
-0.74
ntil
-0.72
INAL
-0.71
ascal
-0.69
ffic
-0.69
ilial
-0.68
SLI
-0.67
agonist
-0.67
POSITIVE LOGITS
Burns
1.08
LEY
0.88
leys
0.78
Burn
0.78
lee
0.78
ford
0.78
ville
0.78
leigh
0.76
ley
0.74
netflix
0.74
Activations Density 0.005%