INDEX
    Explanations

    specific terms or technologies that are not commonly recognized or require clarification.

    The neuron detects the model’s expressions of apology and unfamiliarity (e.g. “I’m sorry, but I’m not familiar with…”).

    New Auto-Interp
    Negative Logits
    .pres
    -0.07
     equipped
    -0.06
     Hawth
    -0.06
     fract
    -0.06
    िछल
    -0.06
     Kosovo
    -0.06
    (commands
    -0.06
    ıyor
    -0.06
     pH
    -0.06
     wholesome
    -0.06
    POSITIVE LOGITS
     jur
    0.07
     prez
    0.07
     Components
    0.06
    /pub
    0.06
    plib
    0.06
     hus
    0.06
     لل
    0.06
     lumber
    0.06
     الملك
    0.06
     heal
    0.06
    Act Density 0.033%

    No Known Activations