INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    DataAnnotations
    -0.59
    dct
    -0.52
     Activision
    -0.50
     POW
    -0.50
     Spon
    -0.49
     GLA
    -0.48
    tzmann
    -0.48
    ngth
    -0.48
     опро
    -0.48
     Tibetan
    -0.48
    POSITIVE LOGITS
     here
    1.87
    here
    1.71
    Here
    1.52
     Here
    1.48
     aquí
    1.46
     HERE
    1.44
    HERE
    1.34
     здесь
    1.33
     aqui
    1.28
     tää
    1.26
    Act Density 0.032%

    No Known Activations