INDEX
    Explanations

    negative punctuation or symbols indicating strong disapproval

    New Auto-Interp
    Negative Logits
    ungan
    -0.16
    rish
    -0.15
     Mali
    -0.15
    .Metro
    -0.14
    XObject
    -0.14
     crossorigin
    -0.14
    noc
    -0.14
    .jet
    -0.14
    ;br
    -0.14
    ousel
    -0.13
    POSITIVE LOGITS
     Watches
    0.27
     watches
    0.26
     watch
    0.23
    -watch
    0.22
     Hy
    0.21
     wearer
    0.21
    watch
    0.20
    _watch
    0.20
     Basel
    0.20
    Hy
    0.20
    Act Density 0.000%

    No Known Activations