INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gameplay
    -0.07
     economy
    -0.07
     Economy
    -0.07
     Leaders
    -0.06
     tablesp
    -0.06
    ôi
    -0.06
     prized
    -0.06
    istrat
    -0.06
     preventative
    -0.06
     segmented
    -0.06
    POSITIVE LOGITS
     fc
    0.07
     WX
    0.07
     أنها
    0.07
     NEC
    0.07
    0.06
     whims
    0.06
    "He
    0.06
    工作
    0.06
    .logger
    0.06
     pil
    0.06
    Act Density 0.025%

    No Known Activations