INDEX
Explanations
This neuron detects the formal discourse markers of academic papers—words like “We,” “Our,” “The,” “goal,” and “in this setting” that introduce problem statements, assumptions, and main contributions.
New Auto-Interp
Negative Logits
đứng
-0.07
fait
-0.07
ADB
-0.06
Tx
-0.06
flash
-0.06
อร
-0.06
dv
-0.06
.put
-0.06
이러
-0.06
"]],↵
-0.06
POSITIVE LOGITS
podnik
0.07
_sex
0.06
versatility
0.06
мужчин
0.06
_parser
0.06
ной
0.06
ihrer
0.06
Project
0.06
.squeeze
0.06
(dir
0.06
Activations Density 0.019%