INDEX
    Explanations

    This neuron activates on occurrences of the word “flag” (and closely related tokens), i.e. it detects mentions of flags.

    New Auto-Interp
    Negative Logits
     vulnerabilities
    -0.07
    ouve
    -0.07
    676
    -0.06
     emphasizes
    -0.06
    dh
    -0.06
    839
    -0.06
    cem
    -0.06
    hor
    -0.06
    Tomorrow
    -0.06
     tissues
    -0.05
    POSITIVE LOGITS
     अम
    0.07
    0.07
     برنامه
    0.07
     knights
    0.07
    lassian
    0.07
    ="./
    0.06
    panel
    0.06
     매매
    0.06
    0.06
    >/<
    0.06
    Act Density 0.009%

    No Known Activations