INDEX
Explanations
This neuron responds strongly to the word “helpful” (as in the phrase “a helpful and … response”).
New Auto-Interp
Negative Logits
inction
-0.07
manifest
-0.07
えた
-0.07
critique
-0.07
variety
-0.06
δο
-0.06
ữa
-0.06
出去
-0.06
-expanded
-0.06
.ColumnHeadersHeightSizeMode
-0.06
POSITIVE LOGITS
VERN
0.06
yards
0.06
Shib
0.06
वर
0.06
řid
0.06
ponto
0.06
Losing
0.06
Osw
0.06
Nissan
0.06
Leer
0.06
Activations Density 0.014%