INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ِن
    -0.08
    _SENT
    -0.06
    ماری
    -0.06
    거래
    -0.06
     Βροχή
    -0.06
     Kohana
    -0.06
     것이
    -0.06
    -0.06
     Fay
    -0.06
     برخورد
    -0.06
    POSITIVE LOGITS
    (single
    0.07
    Personal
    0.07
    Liquid
    0.07
    ]"↵
    0.06
    closure
    0.06
    LAR
    0.06
     Urban
    0.06
    ,obj
    0.06
    (nd
    0.06
    ريقة
    0.06
    Act Density 0.045%

    No Known Activations