INDEX
Explanations
This neuron fires on occurrences of the word "reason," especially in explanatory phrases like "there’s a reason why."
New Auto-Interp
Negative Logits
gré
-0.07
Blitz
-0.07
#index
-0.07
riot
-0.07
.cod
-0.07
serialization
-0.06
для
-0.06
.Persistent
-0.06
Twig
-0.06
패
-0.06
POSITIVE LOGITS
neb
0.07
tı
0.06
金
0.06
místě
0.06
Draws
0.06
Arist
0.06
)':
0.06
colum
0.06
legally
0.06
HDC
0.06
Activations Density 0.008%