INDEX
Explanations
news articles
The neuron activates on discourse‐level connective words (e.g. contrastive or causal transition markers like “despite,” “results,” “therefore,” etc.).
New Auto-Interp
Negative Logits
ekt
-0.07
kinds
-0.07
wann
-0.06
WCS
-0.06
nights
-0.06
اغ
-0.06
-income
-0.06
systems
-0.06
ibration
-0.06
kind
-0.06
POSITIVE LOGITS
knih
0.06
reinforcements
0.06
>",
0.06
):
0.06
ـ
0.06
발
0.06
BAL
0.06
//_
0.06
ruž
0.06
يكا
0.06
Activations Density 0.090%