INDEX
    Explanations

    The neuron activates on the placeholder tokens for character names (e.g. “NAME_1”, “NAME_4”) in the text.

    New Auto-Interp
    Negative Logits
    MK
    -0.08
    385
    -0.07
     humiliating
    -0.07
     pře
    -0.07
    --------------↵
    -0.07
    ترة
    -0.06
    Success
    -0.06
    Menu
    -0.06
    												
    -0.06
    werp
    -0.06
    POSITIVE LOGITS
     direccion
    0.06
    ync
    0.06
     synchronize
    0.06
    0.06
     BOOL
    0.06
     difficile
    0.06
    adr
    0.06
    ifikasi
    0.06
    íl
    0.06
    <Resource
    0.06
    Act Density 0.027%

    No Known Activations