INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     neoliberal
    -0.08
     edgy
    -0.08
     transact
    -0.08
     बाध
    -0.08
    Nu
    -0.08
    Np
    -0.08
    ^(
    -0.07
    TRIES
    -0.07
    _cycles
    -0.07
     fig
    -0.07
    POSITIVE LOGITS
    روفة
    0.09
     Labrador
    0.09
     কোম
    0.08
     Alexandra
    0.08
     forgiving
    0.08
    uscious
    0.08
     lovable
    0.08
    黄色
    0.08
    çao
    0.08
    -loving
    0.08
    Act Density 0.002%

    No Known Activations