INDEX
Explanations
I'm sorry, but it seems there was an issue with the text for Neuron 4. Would you be able to provide the correct text for Neuron 4 activations so that I can analyze it for you?
New Auto-Interp
Negative Logits
rules
-0.66
Allied
-0.60
Clair
-0.59
branches
-0.58
electromagnetic
-0.58
PTS
-0.57
tides
-0.56
Malone
-0.56
uniform
-0.56
excess
-0.55
POSITIVE LOGITS
'm
1.40
've
1.26
stanbul
1.24
nex
1.23
EEE
1.19
zzy
1.11
suppose
1.05
'll
1.03
ANS
1.02
ronic
1.01
Activations Density 0.206%