INDEX
Explanations
together
The neuron specifically detects the word “together.”
New Auto-Interp
Negative Logits
[s
-0.07
n
-0.07
49
-0.06
rons
-0.06
snippet
-0.06
41
-0.06
Bronx
-0.06
Blood
-0.06
s
-0.06
indices
-0.06
POSITIVE LOGITS
together
0.15
Together
0.10
Together
0.10
separating
0.09
gether
0.09
close
0.08
detach
0.08
잡
0.08
ighthouse
0.07
peak
0.07
Activations Density 0.020%