INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     zum
    -0.07
     expectations
    -0.07
    asurable
    -0.07
    .assertFalse
    -0.06
     آرام
    -0.06
     enjoyment
    -0.06
     Diagnosis
    -0.06
    imin
    -0.06
     remed
    -0.06
     Teen
    -0.06
    POSITIVE LOGITS
    _Metadata
    0.07
    fight
    0.06
     rozsah
    0.06
    Prom
    0.06
     fileSize
    0.06
     Cute
    0.06
     travelling
    0.06
    ERRY
    0.06
    erence
    0.06
     Πρω
    0.06
    Act Density 0.000%

    No Known Activations