INDEX
Explanations
The neuron detects the occurrence of the phrase “follow up questions.”
New Auto-Interp
Negative Logits
labyrinth
-0.06
overwrite
-0.06
ぁ
-0.06
gods
-0.06
�
-0.06
meta
-0.06
eper
-0.06
ं
-0.06
Pru
-0.06
WISE
-0.06
POSITIVE LOGITS
THEY
0.06
べて
0.06
stacle
0.06
民
0.06
zvyš
0.06
))/(
0.06
denote
0.06
самых
0.06
-Y
0.06
(&_
0.06
Activations Density 0.130%