INDEX
Explanations
The neuron fires on wordpiece tokens that mark the beginnings of multi‐syllable or less common words (often proper nouns or technical terms).
New Auto-Interp
Negative Logits
komple
-0.06
verified
-0.06
CAN
-0.06
array
-0.06
signing
-0.06
incare
-0.06
đảng
-0.06
新
-0.05
grades
-0.05
گ
-0.05
POSITIVE LOGITS
:Event
0.07
responders
0.07
_photos
0.07
\FrameworkBundle
0.07
_lex
0.07
Road
0.06
-capital
0.06
recio
0.06
iq
0.06
acs
0.06
Activations Density 0.070%