INDEX
Explanations
This neuron detects terms and phrases expressing causality (e.g., “causal,” “cause,” “relationship,” “effect”).
New Auto-Interp
Negative Logits
гри
-0.07
ollect
-0.07
↵↵
-0.07
izioni
-0.07
RK
-0.06
нання
-0.06
TRE
-0.06
interception
-0.06
SSFWorkbook
-0.06
مبر
-0.06
POSITIVE LOGITS
causal
0.12
caus
0.10
casualty
0.07
dů
0.07
قض
0.06
_dash
0.06
sujet
0.06
้าว
0.06
當
0.06
520
0.06
Activations Density 0.005%