INDEX
Explanations
When looking through the activations provided, it appears this neuron is consistently finding occurrences of the word "take" in various contexts
instances of the phrase "take a" followed by various contexts
New Auto-Interp
Negative Logits
ndra
-0.79
ells
-0.71
Ü
-0.69
displays
-0.68
tions
-0.66
tu
-0.65
IAS
-0.65
IOR
-0.64
eller
-0.64
α
-0.63
POSITIVE LOGITS
seriously
0.91
lightly
0.84
plunge
0.83
reins
0.81
cue
0.78
stride
0.78
tumble
0.73
ãĥīãĥ©
0.70
lesson
0.67
tack
0.67
Activations Density 0.147%