INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     poorer
    -0.07
     hw
    -0.06
     interpre
    -0.06
     production
    -0.06
     sz
    -0.06
    -ps
    -0.06
     всп
    -0.06
    (ed
    -0.06
     Outputs
    -0.06
     inputs
    -0.06
    POSITIVE LOGITS
     Developer
    0.25
    Developer
    0.18
     developer
    0.14
    veloper
    0.08
     archaeological
    0.08
    ponsive
    0.07
     Developers
    0.07
    opher
    0.06
    ξεις
    0.06
    0.06
    Act Density 0.006%

    No Known Activations