INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     enjoyed
    -0.09
    daughter
    -0.08
     تأ
    -0.08
    -0.08
    فال
    -0.08
     funeral
    -0.08
     údaj
    -0.07
     substant
    -0.07
    rnd
    -0.07
     destruct
    -0.07
    POSITIVE LOGITS
     обнаруж
    0.18
     detects
    0.17
    Detection
    0.17
     detectar
    0.17
     detecting
    0.16
     detection
    0.16
    .detect
    0.16
    _detection
    0.16
     Detection
    0.16
    发现
    0.15
    Act Density 0.081%

    No Known Activations