INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     DY
    -0.07
    _ED
    -0.07
     corridors
    -0.06
    edula
    -0.06
     mo
    -0.06
     stej
    -0.06
    亿
    -0.06
     Prairie
    -0.06
    <r
    -0.05
     hay
    -0.05
    POSITIVE LOGITS
     McDonald
    0.08
     stereotypes
    0.07
     Bras
    0.07
     reductions
    0.07
    '))↵
    0.07
    autos
    0.07
     MacDonald
    0.07
     john
    0.07
     John
    0.07
    .k
    0.06
    Act Density 0.003%

    No Known Activations