INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ujednoznacz
    -0.75
    bewerken
    -0.65
    esModule
    -0.59
    ftagPool
    -0.57
     bezeichneter
    -0.57
    سياسي
    -0.57
     consuming
    -0.57
    Serviço
    -0.56
    AccessorTable
    -0.56
     Neuf
    -0.56
    POSITIVE LOGITS
    ControllerAdvice
    0.52
     mat
    0.43
    numerusform
    0.42
     समीक्षक
    0.40
     mats
    0.40
    class
    0.40
     Wiktionnaire
    0.40
     @"/
    0.39
     Commencez
    0.39
    bastien
    0.38
    Act Density 0.002%

    No Known Activations