INDEX
    Explanations

    phrases related to types, categories, or classifications

    New Auto-Interp
    Negative Logits
    ****************
    -0.59
    -0.58
    paroles
    -0.55
     Saltar
    -0.53
    TF
    -0.48
     κάθε
    -0.48
     preços
    -0.47
     Савезне
    -0.47
     每
    -0.47
     partea
    -0.46
    POSITIVE LOGITS
    ldorf
    0.65
    sterone
    0.61
    pośred
    0.61
    واد
    0.60
     mắn
    0.59
    WebServlet
    0.58
    hubanes
    0.57
    gnore
    0.54
    orianCalendar
    0.54
    IMPORTED
    0.53
    Act Density 0.667%

    No Known Activations