INDEX
Explanations
I'm sorry, but based on the activations provided, I'm unable to determine a specific pattern or theme that neuron 4 is looking for in the text
special characters or non-standard symbols in the text
New Auto-Interp
Negative Logits
geries
-0.91
raints
-0.72
background
-0.70
distracted
-0.69
foreground
-0.69
offending
-0.68
plain
-0.68
gery
-0.67
Plain
-0.67
dividing
-0.67
POSITIVE LOGITS
Ħ
1.41
ij
1.14
¸
1.10
ļ
1.06
ĸ
1.06
и
1.03
Ĺ
1.03
¶
1.02
¼
1.01
¾
1.00
Activations Density 0.003%