INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sut
    -0.08
     сут
    -0.08
    rollen
    -0.08
    -0.08
    Fold
    -0.07
     Jour
    -0.07
     Händ
    -0.07
     dolls
    -0.07
     hedge
    -0.07
     Fold
    -0.07
    POSITIVE LOGITS
     scant
    0.08
    verb
    0.08
    ari
    0.08
    ascus
    0.08
    że
    0.08
     apartheid
    0.07
     Lenn
    0.07
    ARI
    0.07
    arro
    0.07
    0.07
    Act Density 0.006%

    No Known Activations