INDEX
    Explanations

    This neuron detects hedging or implication language—phrases that qualify results and call for further study, validation, or clarification.

    New Auto-Interp
    Negative Logits
    ека
    -0.06
    -0.06
    …ط
    -0.06
     biggest
    -0.06
     inhibited
    -0.06
     CD
    -0.06
     E
    -0.06
     Joshua
    -0.06
     п
    -0.06
     canyon
    -0.06
    POSITIVE LOGITS
    δί
    0.07
    атем
    0.07
    하는데
    0.06
    lod
    0.06
    うち
    0.06
    textView
    0.06
    £
    0.06
    aget
    0.06
    scrollView
    0.06
    _VOID
    0.06
    Act Density 0.146%

    No Known Activations