INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     anecdotes
    -0.07
    -0.07
     diagnose
    -0.07
     JA
    -0.07
     anecd
    -0.06
    _cum
    -0.06
     лож
    -0.06
     ули
    -0.06
    52
    -0.06
    ович
    -0.06
    POSITIVE LOGITS
     power
    0.17
    Power
    0.16
     Power
    0.16
    power
    0.12
     powers
    0.11
    POWER
    0.11
    (power
    0.11
    -power
    0.11
     POWER
    0.11
    .power
    0.11
    Act Density 0.050%

    No Known Activations