INDEX
Explanations
blog posts, articles
The neuron predominantly activates on content‐bearing tokens—especially nouns, adjectives, and other informative words—rather than on common function words.
New Auto-Interp
Negative Logits
minecraft
-0.06
み
-0.06
oci
-0.06
owes
-0.06
داد
-0.06
िद
-0.06
ών
-0.06
گذاری
-0.06
kutje
-0.06
onClick
-0.06
POSITIVE LOGITS
stown
0.07
culprit
0.07
manuscripts
0.06
costly
0.06
Abort
0.06
bach
0.06
stretch
0.06
.Black
0.06
outlaw
0.06
StyleSheet
0.06
Activations Density 0.186%