INDEX
Explanations
I'm sorry, but based on the provided activations, it is difficult to clearly summarize what Neuron 4 is looking for. There is a variety of seemingly random text content, making it challenging to pinpoint a specific pattern or theme that the neuron is primarily activating for
mentions of the website "Infowars."
New Auto-Interp
Negative Logits
Rolls
-0.68
bye
-0.66
Dunham
-0.62
uninterrupted
-0.62
bom
-0.61
valiant
-0.59
Winston
-0.58
successors
-0.58
davidjl
-0.57
Hug
-0.57
POSITIVE LOGITS
owship
1.01
cial
0.99
ructure
0.97
iband
0.97
renheit
0.95
essim
0.90
ruct
0.89
urated
0.81
aneous
0.79
luster
0.79
Activations Density 0.036%