INDEX
Explanations
The main thing this neuron does is find words related to consequences or outcomes
New Auto-Interp
Negative Logits
sites
-0.68
Ribbon
-0.63
Robo
-0.60
pload
-0.58
Fired
-0.58
Sau
-0.58
voic
-0.58
mbuds
-0.57
Spoon
-0.57
Scholars
-0.56
POSITIVE LOGITS
thereof
1.19
of
0.91
ainer
0.76
iveness
0.74
OF
0.71
growth
0.70
of
0.70
ivity
0.67
stemming
0.67
ively
0.65
Activations Density 0.038%