INDEX
    Explanations

    The neuron fires on structural or control tokens marking the boundaries and roles in the conversation (e.g. start/end markers and speaker‐ID tags).

    New Auto-Interp
    Negative Logits
    onga
    -0.07
    ">'+
    -0.06
    ;\↵
    -0.06
    وسی
    -0.06
    _prov
    -0.06
    цій
    -0.06
    ypsum
    -0.06
     GETGLOBAL
    -0.06
    ικοί
    -0.06
     Guerr
    -0.06
    POSITIVE LOGITS
    ataloader
    0.09
     load
    0.07
     Bender
    0.07
    ンの
    0.07
    -port
    0.06
     Loads
    0.06
    WHAT
    0.06
     navy
    0.06
     sympathetic
    0.06
     feeder
    0.06
    Act Density 0.001%

    No Known Activations