INDEX
    Explanations

    foreign languages

    This neuron activates on non-English text segments—especially tokens with diacritics or Cyrillic characters.

    New Auto-Interp
    Negative Logits
    Coefficient
    -0.07
    ็ด
    -0.07
    ーツ
    -0.07
    Traits
    -0.07
     CPPUNIT
    -0.06
    SubMenu
    -0.06
     pretext
    -0.06
    итет
    -0.06
     whistleblower
    -0.06
    -0.06
    POSITIVE LOGITS
     /*!↵
    0.07
     некоторых
    0.06
     skl
    0.06
     tidak
    0.06
    結果
    0.06
     metodo
    0.06
     ceux
    0.06
    -[
    0.06
    [${
    0.06
    *******
    0.06
    Act Density 0.069%

    No Known Activations