INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     religious
    -0.07
     inauguration
    -0.07
    شح
    -0.07
     notas
    -0.07
     anyone
    -0.07
    -0.07
     fertility
    -0.07
     songs
    -0.07
    -0.07
    тық
    -0.07
    POSITIVE LOGITS
     privilegi
    0.09
     ting
    0.08
     schlä
    0.08
     kän
    0.08
     Taco
    0.08
    0.07
    ieden
    0.07
    
    0.07
     porcel
    0.07
     bleed
    0.07
    Act Density 0.003%

    No Known Activations