INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sexo
    -0.06
     пацієн
    -0.06
     incremented
    -0.06
    ))*(
    -0.06
     Adj
    -0.06
    شنبه
    -0.06
     emp
    -0.06
     지난
    -0.06
     oste
    -0.06
     ↵
    -0.06
    POSITIVE LOGITS
     theory
    0.14
     Theory
    0.13
    Theory
    0.11
     theories
    0.11
     teor
    0.10
     THEORY
    0.09
    theory
    0.08
    0.08
    Strategy
    0.08
     theorists
    0.07
    Act Density 0.018%

    No Known Activations