INDEX
Explanations
from the activations shown, it seems like this neuron is looking for words ending with "-um"
occurrences of the token "um."
New Auto-Interp
Negative Logits
cutoff
-0.72
strawberries
-0.69
âĸĪâĸĪâĸĪâĸĪâĸĪâĸĪâĸĪâĸĪ
-0.67
Aires
-0.65
elves
-0.65
Ń·
-0.64
Shades
-0.63
blackout
-0.62
jri
-0.62
Morales
-0.62
POSITIVE LOGITS
osity
1.09
mers
1.09
ming
1.06
essage
0.98
etric
0.98
atism
0.97
mit
0.97
um
0.97
pty
0.96
antic
0.96
Activations Density 0.015%