INDEX
    Explanations

    definitions

    New Auto-Interp
    Negative Logits
    UBA
    -0.08
     unterschiedlich
    -0.08
     joyful
    -0.08
     baos
    -0.08
     Verkaufs
    -0.08
     verplicht
    -0.08
    .za
    -0.08
    AMESPACE
    -0.07
     Verbesser
    -0.07
     happier
    -0.07
    POSITIVE LOGITS
     adjacency
    0.08
     closest
    0.08
     meio
    0.08
     excluded
    0.08
    Closest
    0.08
    closest
    0.08
     insult
    0.08
     दूर
    0.08
     pitk
    0.08
     हट
    0.08
    Act Density 0.018%

    No Known Activations