INDEX
    Explanations

    This neuron detects terms and phrases expressing causality (e.g., “causal,” “cause,” “relationship,” “effect”).

    New Auto-Interp
    Negative Logits
     гри
    -0.07
    ollect
    -0.07
                    ↵↵
    -0.07
    izioni
    -0.07
     RK
    -0.06
    нання
    -0.06
     TRE
    -0.06
     interception
    -0.06
    SSFWorkbook
    -0.06
    مبر
    -0.06
    POSITIVE LOGITS
     causal
    0.12
     caus
    0.10
     casualty
    0.07
    0.07
     قض
    0.06
    _dash
    0.06
     sujet
    0.06
    ้าว
    0.06
    0.06
    520
    0.06
    Act Density 0.005%

    No Known Activations