INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fear
    -0.07
    ;;;
    -0.06
     seventy
    -0.06
     страх
    -0.06
     Пло
    -0.06
     Dominion
    -0.06
     فناوری
    -0.06
    cm
    -0.06
     عضو
    -0.06
     uploaded
    -0.06
    POSITIVE LOGITS
    ptoms
    0.07
     слыш
    0.07
    .backward
    0.06
     -->
    ↵
    0.06
    (ec
    0.06
     entityType
    0.06
    CLICK
    0.06
     Coy
    0.06
    ‌ها
    0.06
     zku
    0.06
    Act Density 0.002%

    No Known Activations