INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    auh
    -0.09
     estable
    -0.08
    Bride
    -0.08
     Porque
    -0.08
     ONLINE
    -0.08
    -0.08
     Turks
    -0.08
    .coordinate
    -0.08
    -0.08
     bie
    -0.08
    POSITIVE LOGITS
    اعت
    0.08
    लेक
    0.07
    ämp
    0.07
    0.07
     capitalist
    0.07
    êm
    0.07
    -button
    0.07
     welfare
    0.07
    0.07
     liberal
    0.07
    Act Density 0.003%

    No Known Activations