INDEX
    Explanations

    This neuron activates primarily on the standalone word “crash” (in any context or casing).

    New Auto-Interp
    Negative Logits
     Intent
    -0.07
    edu
    -0.06
     reputed
    -0.06
     Activ
    -0.06
     voluntary
    -0.06
    Policy
    -0.06
    -0.06
     Dol
    -0.06
     evaluating
    -0.06
    діл
    -0.06
    POSITIVE LOGITS
     crash
    0.14
     crashes
    0.12
     Crash
    0.11
     crashed
    0.10
     crashing
    0.08
    irsch
    0.08
     clashes
    0.08
    0.08
     CR
    0.07
    _trampoline
    0.07
    Act Density 0.004%

    No Known Activations