INDEX
    Explanations

    controlled experiments and variables

    New Auto-Interp
    Negative Logits
     Vladimir
    -0.08
     pension
    -0.08
    .extend
    -0.08
    PRI
    -0.07
    Princess
    -0.07
    alwa
    -0.07
    Fuse
    -0.07
     pyg
    -0.07
    -0.07
    czę
    -0.07
    POSITIVE LOGITS
     controlled
    0.13
    Controlled
    0.13
     Controlled
    0.12
    controlled
    0.12
     gecontrole
    0.11
     rigor
    0.11
    -controlled
    0.11
     Treatments
    0.10
     controlar
    0.10
     эксперимент
    0.10
    Act Density 0.027%

    No Known Activations