INDEX
Explanations
The neuron fires on frequent function words and other common “structural” tokens (articles, conjunctions, simple verbs, and prepositions) rather than on domain-specific content.
New Auto-Interp
Negative Logits
buzz
-0.07
诚
-0.07
LARI
-0.07
apologies
-0.06
...'
-0.06
.Chart
-0.06
ificates
-0.06
فقط
-0.06
genotype
-0.06
ECH
-0.06
POSITIVE LOGITS
ngine
0.07
mie
0.07
(![
0.07
приб
0.07
随
0.07
�
0.07
(dirname
0.07
WIN
0.06
المن
0.06
layui
0.06
Activations Density 0.039%