INDEX
    Explanations

    phrases indicating types or classifications of things

    New Auto-Interp
    Negative Logits
    het
    -0.64
     special
    -0.62
     dit
    -0.59
     von
    -0.55
    HET
    -0.55
     doch
    -0.51
     Food
    -0.51
    وت
    -0.49
     Sy
    -0.49
    İstinadlar
    -0.49
    POSITIVE LOGITS
     itſelf
    0.76
    ReusableCell
    0.73
     defaultstate
    0.72
    حياته
    0.71
     Мексичка
    0.70
    المناصب
    0.70
     disambiguazione
    0.68
    esterno
    0.68
    openzeppelin
    0.68
     Inscrivez
    0.67
    Act Density 0.008%

    No Known Activations