INDEX
Explanations
conversational text
the neuron fires on utterances that express agreement/acknowledgement or apologies directed at the user (e.g., "You are right", "I apologize", "I understand").
New Auto-Interp
Negative Logits
níku
-0.07
буде
-0.07
хов
-0.07
gr
-0.06
HEL
-0.06
Juli
-0.06
lahoma
-0.06
зави
-0.06
�
-0.06
coraz
-0.06
POSITIVE LOGITS
marc
0.08
marque
0.07
Thank
0.07
および
0.06
}↵
0.06
mortal
0.06
,↵
0.06
framerate
0.06
Struct
0.06
Label
0.06
Activations Density 0.065%