INDEX
    Explanations

    identifiers and numbers

    New Auto-Interp
    Negative Logits
     эмо
    -0.78
    -0.77
     logistique
    -0.77
     bouteille
    -0.76
    setlength
    -0.76
     tertawa
    -0.73
     helados
    -0.73
     Saltar
    -0.72
    معرفی
    -0.71
     initially
    -0.70
    POSITIVE LOGITS
    ="@+
    0.95
     ID
    0.86
     ids
    0.83
     уника
    0.79
     id
    0.78
    lema
    0.77
    льни
    0.77
    0.73
     kä
    0.72
    MSG
    0.71
    Act Density 0.008%

    No Known Activations