INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ALT
    -0.07
     цій
    -0.07
    erm
    -0.07
    หลวง
    -0.07
     وخ
    -0.07
    .Manager
    -0.07
    ْر
    -0.06
     congen
    -0.06
    _PR
    -0.06
     Cannes
    -0.06
    POSITIVE LOGITS
     tobacco
    0.10
     Tobacco
    0.10
     Nordic
    0.08
     turbines
    0.08
     narcotics
    0.07
    dro
    0.07
     radio
    0.07
    0.07
    озд
    0.07
     health
    0.07
    Act Density 0.003%

    No Known Activations