INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    enj
    -0.08
     creatives
    -0.08
    ngi
    -0.08
    -0.08
     fueling
    -0.08
     creativity
    -0.08
     creatividad
    -0.07
    Creative
    -0.07
     breakthroughs
    -0.07
    -0.07
    POSITIVE LOGITS
    fair
    0.09
     fairness
    0.09
     fair
    0.08
    vira
    0.08
    านุ
    0.07
     Engl
    0.07
    clusive
    0.07
    ibility
    0.07
    əc
    0.07
    ador
    0.07
    Act Density 0.003%

    No Known Activations