INDEX
    Explanations

    This neuron detects occurrences of the word “insider” (and its variants) in the text.

    New Auto-Interp
    Negative Logits
     Garcia
    -0.07
     jose
    -0.07
     کمک
    -0.07
     Buildings
    -0.07
     giám
    -0.07
     empleado
    -0.06
     ERP
    -0.06
    rowning
    -0.06
    _attempts
    -0.06
     Linda
    -0.06
    POSITIVE LOGITS
     Insider
    0.10
     insider
    0.09
     insiders
    0.09
     outsider
    0.07
    .Hidden
    0.06
    Back
    0.06
    -sur
    0.06
    inject
    0.06
    hammer
    0.06
    如下
    0.06
    Act Density 0.002%

    No Known Activations