INDEX
Explanations
The neuron specifically signals on occurrences of the word “afraid.”
New Auto-Interp
Negative Logits
Winter
-0.07
ilmiştir
-0.07
(internal
-0.07
ו�
-0.07
uttle
-0.07
iter
-0.06
Zur
-0.06
silver
-0.06
Gerr
-0.06
ers
-0.06
POSITIVE LOGITS
afraid
0.14
raid
0.07
said
0.07
.ReadFile
0.07
ashamed
0.07
ifen
0.07
aque
0.07
endPoint
0.07
_backend
0.06
readOnly
0.06
Activations Density 0.004%