INDEX
Explanations
punctuation marks
The neuron flags tokens that are part of the assistant’s (AI’s) response segments, i.e. it detects when the text is coming from the assistant rather than the user.
New Auto-Interp
Negative Logits
ades
-0.06
instances
-0.06
ạm
-0.06
heir
-0.06
questions
-0.06
.enabled
-0.06
Rates
-0.06
düşman
-0.06
radio
-0.06
�
-0.06
POSITIVE LOGITS
及
0.07
QVERIFY
0.07
КТ
0.07
roveň
0.07
:@{0.07
)/(
0.06
及
0.06
bied
0.06
ाहत
0.06
smooth
0.06
Activations Density 0.035%