INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Reward
    -0.06
     budgets
    -0.06
     ($
    -0.06
    ид
    -0.06
    иг
    -0.06
     Blood
    -0.06
     severity
    -0.06
     проблем
    -0.06
     pady
    -0.06
     yup
    -0.06
    POSITIVE LOGITS
     tan
    0.08
     Strange
    0.08
    0.07
    ालन
    0.07
     stranger
    0.07
     strange
    0.07
    AGE
    0.07
    غاز
    0.06
     Intr
    0.06
     ван
    0.06
    Act Density 0.007%

    No Known Activations