INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    😛
    0.78
     malicious
    0.75
    😚
    0.73
     harmful
    0.73
     vouchers
    0.71
    🔛
    0.70
     রোববার
    0.70
    🐽
    0.69
     incomes
    0.69
     recreational
    0.69
    POSITIVE LOGITS
     Process
    1.59
     Context
    1.51
     Support
    1.49
     Role
    1.47
     Work
    1.46
     Approach
    1.44
     Setup
    1.43
     Compet
    1.43
     Function
    1.42
     Relationship
    1.40
    Act Density 6.495%

    No Known Activations