INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    terror
    -0.08
    Technology
    -0.07
    문화
    -0.07
    landscape
    -0.07
    איש
    -0.07
    -0.07
    -0.07
     совершенно
    -0.07
     uncert
    -0.07
    rance
    -0.06
    POSITIVE LOGITS
    执导
    0.08
    Related
    0.07
     Based
    0.07
    部位
    0.07
     groin
    0.06
    =min
    0.06
    .di
    0.06
     pil
    0.06
    领导小组
    0.06
     dam
    0.06
    Act Density 0.000%

    No Known Activations