INDEX
    Explanations

    The neuron detects the “viewed in the light most favorable to [the X]” phrase used to describe the standard‐of‐review for evidence.

    New Auto-Interp
    Negative Logits
    报告
    -0.07
    _runner
    -0.07
     conte
    -0.07
    Homepage
    -0.06
     دور
    -0.06
    439
    -0.06
    passwd
    -0.06
     Cri
    -0.06
    Looking
    -0.06
    uper
    -0.06
    POSITIVE LOGITS
     заболеваний
    0.07
     dro
    0.07
     πολι
    0.07
    /dev
    0.07
    greg
    0.06
    <Button
    0.06
     marine
    0.06
    κυ
    0.06
     Apple
    0.06
    (ValueError
    0.06
    Act Density 0.003%

    No Known Activations