INDEX
    Explanations

    The neuron lights up on qualifying instruction words—most prominently “relevant” (and similar qualifiers like “possible”).

    New Auto-Interp
    Negative Logits
    /alert
    -0.07
    stalk
    -0.07
     DIS
    -0.07
    ,看
    -0.07
    criptions
    -0.06
    dül
    -0.06
    Where
    -0.06
    -0.06
     discs
    -0.06
    -0.06
    POSITIVE LOGITS
     také
    0.07
     lonely
    0.07
     اجازه
    0.07
    .toFloat
    0.07
    无码
    0.06
    Λ
    0.06
     mdl
    0.06
     possível
    0.06
     Ngân
    0.06
     değildir
    0.06
    Act Density 0.038%

    No Known Activations