INDEX
Explanations
It appears that neuron 4 did not activate for any tokens in the provided examples, suggesting it seems to be looking for a pattern or feature not present in the text samples
New Auto-Interp
Negative Logits
rocks
-0.79
inas
-0.72
rall
-0.70
scaling
-0.68
inline
-0.64
nutshell
-0.64
gem
-0.62
Interstitial
-0.62
clad
-0.61
pel
-0.60
POSITIVE LOGITS
etsk
0.85
uberty
0.83
olitan
0.78
Seb
0.77
Sov
0.73
Everybody
0.73
Neb
0.73
Flavoring
0.71
Nobody
0.69
ĨĴ
0.69
Activations Density 0.000%
No Known Activations
This feature has no known activations.